Dropout LayerΒΆ

In deep neural networks, we may encounter over fitting when our network is complex and with many parameters. In Dropout: A Simple Way to Prevent Neural Networks from Overfitting, N.Srivastava et al proposed a simple technique named Dropout that could prevent overfitting. It refers to dropping out some neurons in a neural network randomly. The mechanism is equivalent to training different neural networks with different architecture in every batch.

The parameters for this layer is a preset probability \(p_0\). It indicates the probability of dropping a neuron. For example, if \(p_0=0.5\), then it means that every neuron in this layer has a \(0.5\) chance of being dropped. With the given probability, we can define the dropout layer to be a function \(y_i=f(x_i)\) such that

\[\begin{split}y_i= \begin{cases} 0 & \text{$r_i<p$} \\ x_i & \text{$r_i\geq p$} \\ \end{cases}\end{split}\]

, where \(r_i\) is randomly generated. However, if we use this function, the expectations of the output of dropout layer will be scaled to \(p_0\). For example, if the original output is \(1\) and \(p_0=0.5\), the output will become \(0.5\). This is unsatifactory because when we are testing the neural networks, we do not want the output to be scaled. Thus, in practice we define the function to be

\[\begin{split}y_i= \begin{cases} 0 & \text{$r_i<p$} \\ x_i/p & \text{$r_i\geq p$} \\ \end{cases}\end{split}\]

Then the backward computation becomes straightforward:

\[\begin{split}\frac{\partial l}{\partial x_i}= \begin{cases} 0 \times \frac{\partial l}{\partial y_i}=0 & \text{$r_i<p$} \\ \frac{\partial l }{\partial y_i}\times\frac{\partial y_i}{\partial x_i}=\frac{1}{p}\frac{\partial l}{\partial y_i} & \text{$r_i\geq p$} \\ \end{cases}\end{split}\]

The implementation of dropout layer in tinyml is as below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
from tinyml.core import Backend as np

from .base import Layer


class Dropout(Layer):
    '''
    Dropout Layer randomly drop several nodes.
    '''
    def __init__(self, name, probability):
        super().__init__(name)
        self.probability = probability
        self.type = 'Dropout'

    def forward(self, input):
        self.mask = np.random.binomial(1, self.probability,
                                       size=input.shape) / self.probability
        return (input * self.mask).reshape(input.shape)

    def backward(self, in_gradient):
        return in_gradient * self.mask