Max Pooling LayerΒΆ

Pooling layer is another important component of convolutional neural networks. There are many ways of performing pooling in this layer, such as max pooling, average pooling, etc. In this part, we will only discuss max-pooling layer as it is used most commonly in convolutional neural networks.

In Max pooling layer, we also have a spatially small sliding window called the kernel. In the window, only the largest value will be remained and all other values will be dropped. For example, assume we have

\[\begin{split}A=\left[ {\begin{array}{*{20}c} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{array} } \right]\end{split}\]

and a \(2\times 2\) max-pooling kernel. Then the output \(C\) will be

\[\begin{split}C=\left[ {\begin{array}{*{20}c} 5 & 6 \\ 8 & 9 \\ \end{array} } \right]\end{split}\]

With the given kernel size \(K_w\) and \(K_h\), We can formalize the max-pooling process as

\[\begin{split}f(x_{ij})= \begin{cases} x_{ij} & \text{$x_{ij}\geq x_{mn}, \forall m\in [i-K_w, i+K_w], n\in [j-K_h,j+K_h]$} \\ 0 & \text{otherwise} \\ \end{cases}\end{split}\]

Hence we can compute the derivative as below:

\[\begin{split}\frac{\partial l}{\partial x_{ij}}=\frac{\partial l}{\partial f}\frac{\partial f}{\partial x_{ij}}= \begin{cases} \frac{\partial l}{\partial f} & \text{$x_{ij}\geq x_{mn}, \forall m\in [i-K_w, i+K_w], n\in [j-K_h,j+K_h]$} \\ 0 & \text{otherwise} \\ \end{cases}\end{split}\]

The implementation of max pooling layer in Tinynet is as below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
from .base import Layer
from tinynet.core import Backend as np
from .convolution import im2col_indices, col2im_indices

class MaxPool2D(Layer):
    '''
    Perform Max pooling, i.e. select the max item in a sliding window.
    '''
    def __init__(self, name, input_dim, size, stride, return_index=False):
        super().__init__(name)
        self.type = 'MaxPool2D'
        self.input_channel, self.input_height, self.input_width = input_dim
        self.size = size
        self.stride = stride
        self.return_index = return_index
        self.out_height = (self.input_height - size[0]) / stride + 1
        self.out_width = (self.input_width - size[1]) / stride + 1
        if not self.out_height.is_integer() or not self.out_width.is_integer():
            raise Exception("[Tinynet] Invalid dimension settings!")
        self.out_width = int(self.out_width)
        self.out_height = int(self.out_height)
        self.out_dim = (self.input_channel, self.out_height, self.out_width)

    def forward(self, input):
        self.num_of_entries = input.shape[0]
        input_reshaped = input.reshape(input.shape[0] * input.shape[1], 1, input.shape[2], input.shape[3])
        self.input_col = im2col_indices(input_reshaped, self.size[0], self.size[1], padding=0, stride=self.stride)
        self.max_indices = np.argmax(self.input_col, axis=0)
        self.total_count = list(range(0, self.max_indices.size))
        output = self.input_col[self.max_indices, self.total_count]
        output = output.reshape(self.out_height, self.out_width, self.num_of_entries, self.input_channel).transpose(2,3,0,1)
        indices = self.max_indices.reshape(self.out_height, self.out_width, self.num_of_entries, self.input_channel).transpose(2,3,0,1)
        if self.return_index:
            return output, indices
        else:
            return output

    def backward(self, in_gradient):
        gradient_col = np.zeros_like(self.input_col)
        gradient_flat = in_gradient.transpose(2,3,0,1).ravel()
        gradient_col[self.max_indices, self.total_count] = gradient_flat
        shape = (self.num_of_entries*self.input_channel, 1, self.input_height, self.input_width)
        out_gradient = col2im_indices(gradient_col, shape, self.size[0], self.size[1], padding=0, stride=self.stride).reshape(self.num_of_entries, self.input_channel, self.input_height, self.input_width)
        return out_gradient