ReLuΒΆ

The purpose of using activation functions is to bring some non-linearity into the deep neural networks, so that the networks can fit the real world. One of the most popular activation function is the rectifier linear unit (ReLu).

The function is defined as \(f(x)=max(0,x)\). Thus the forward pass is simple: \(y_i=max(0, x_i)\).

In the ReLu function, we do not have any weight or bias to update. Hence we only need to compute the gradient to previous layer. We have \(\frac{\partial{l}}{\partial{x_i}}=\frac{\partial{l}}{\partial{y_i}}\frac{\partial{y_i}}{\partial{x_i}}\).

Then we have:

\[\begin{split}\frac{\partial{l}}{\partial{x_i}}= \begin{cases} 0 & \text{$x_i$<0} \\ \frac{\partial{l}}{\partial{y_i}} & \text{$x_i$>0} \\ undefined & \text{$x_i$=0} \end{cases}\end{split}\]

We see that the derivative is not defined at the point \(x_i=0\), but when computing, we can set it to be \(0\), or \(1\), or any other values between.

The implementation of ReLu layer in Tinynet is as below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
from .base import Layer
from tinynet.core import Backend as np

class ReLu(Layer):
    '''
    ReLu layer performs rectifier linear unit opertaion.
    '''
    def __init__(self, name):
        super().__init__(name)
        self.type='ReLu'

    def forward(self, input):
        '''
        In the forward pass, the output is defined as :math:`y=max(0, x)`.
        '''
        self.input = input
        return input * (input > 0)

    def backward(self, in_gradient):
        return in_gradient * (self.input > 0)