Transposed Convolutional LayerΒΆ

Definition In Section 3.1.2 Convolution, we unrolled the filter from a \(2\times 2\) matrix into a \(4\times 9\) matrix, so that we can perform the convolution by matrix multiplication. After the convolution operation, the input data changes from a \(3\times 3\) matrix to a \(2\times 2\) matrix. The deconv operation is defined as the inverse of the convolution operation, i.e. change the input data from a \(2\times 2\) matrix into an output matrix with the shape \(3\times 3\) in our example. The deconv operation does not guarantee that we will have the same values in the output as the original matrix. Below we will show how it is computed in the forward pass.

Forward Pass When computing the forward pass of deconv operation, we can simply transpose the unrolled filters matrix, for example, it will be a \(4\times 9\) matrix in our case. After the transpose, we can define the deconv operation as \(X=(W^*)^T Y\), i.e. we use the transposed, and unrolled filter matrix to multiply the output of the convolution operation.

We assume that we have an input \(Y\)( exactly the same with the output of the convolution operation in our previous example, hence we will use \(Y\) as the notation for this input) and the same filter \(W\) as

\[\begin{split}Y=\left[ {\begin{array}{*{20}c} 37 & 47 \\ 67 & 77 \end{array} } \right], W=\left[ {\begin{array}{*{20}c} 1 & 2 \\ 3 & 4 \end{array} } \right]\end{split}\]

Then we want to get a \(3\times 3\) matrix as the output of the deconv operation. Recall that we unrolled the filter into the matrix as

\[\begin{split}W^*=\left[ {\begin{array}{*{20}c} 1 & 2 & 0 & 3 & 4 & 0 & 0 & 0 & 0 \\ 0 & 1 & 2 & 0 & 3 & 4 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 2 & 0 & 3 & 4 & 0 \\ 0 & 0 & 0 & 0 & 1 & 2 & 0 & 3 & 4 \end{array} } \right]\end{split}\]

We can compute the desired matrix by performing transpose on the filter matrix first, and then multiply it with our input. We will have

\[X=(W^*)^TY_{4\times 1}=[37, 121, 94, 178, 500, 342, 201, 499, 308]^T\]

Then we can reshape it back into a \(3\times 3\) matrix as \(X_{3\times 3}=\left[ {\begin{array}{*{20}c}37 & 121 & 94 \\178 & 500 & 342 \\201 & 499 & 308 \end{array} } \right]\)

As we see in this example, the deconv operation does not guarantee that we will have the same input of convolution operation, but just guarantee we will have a matrix with the same shape as the input of convolution operation. Since the entries may exceed the maximum light intensity, i.e. \(255\), when we are visualizing the deconv result, we will need to renormalize every entry into the range of \([0,255]\).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
from .base import Layer
from tinynet.core import Backend as np


def im2rows(input, inp_shape, filter_shape, dilation, stride, dilated_shape, padding, res_shape):
    """
    Gradient transformation for the im2rows operation
    :param in_gradient: The grad from the next layer
    :param inp_shape: The shape of the input image
    :param filter_shape: The shape of the filter (num_filters, depth, height, width)
    :param dilation: The dilation for the filter
    :param stride: The stride for the filter
    :param dilated_shape: The dilated shape of the filter
    :param res_shape: The shape of the expected result
    :return: The reformed gradient of the shape of the image
    """
    dilated_rows, dilated_cols = dilated_shape
    num_rows, num_cols = res_shape
    res = np.zeros(inp_shape, dtype=input.dtype)
    input = input.reshape(
        (input.shape[0], input.shape[1], filter_shape[1], filter_shape[2], filter_shape[3]))
    for it in range(num_rows * num_cols):
        # first found index of rows and columns
        # i for rows
        # j for columns
        i = it // num_rows
        j = it % num_rows
        # accessing via colons: [start:end:step]
        # commas are for different dimensions
        res[:, :, i * stride[0]:i * stride[0] + dilated_rows:dilation,
            j * stride[1]:j * stride[1] + dilated_cols:dilation] += input[:, it, :, :, :]
    if (padding != 0):
        # TODO: this only works for pad=1, right now.
        # remove the padding regions
        res = np.delete(res, 0, 2)
        res = np.delete(res, res.shape[2]-1, 2)
        res = np.delete(res, 0, 3)
        res = np.delete(res, res.shape[3]-1,3)
    return res


class Deconv2D(Layer):
    '''
    Deconv2D performs deconvolution operation, or tranposed convolution.    
    '''

    def __init__(self, name, input_dim, n_filters, h_filter, w_filter, stride, dilation=1, padding=0):
        '''
        :param input_dim: the input dimension, in the format of (C,H,W)
        :param n_filters: the number of convolution filters
        :param h_filter: the height of the filter
        :param w_filter: the width of the filter
        :param stride: the stride for forward convolution
        :param dilation: the dilation factor for the filters, =1 by default.
        '''
        super().__init__(name)
        self.type = 'Deconv2D'
        self.input_channel, self.input_height, self.input_width = input_dim
        self.n_filters = n_filters
        self.h_filter = h_filter
        self.w_filter = w_filter
        self.stride = stride
        self.dilation = dilation
        self.padding = padding
        weight = np.random.randn(
            self.n_filters, self.input_channel, self.h_filter, self.w_filter) / np.sqrt(self.n_filters/2.0)
        bias = np.zeros((self.n_filters, 1))
        self.weight = self.build_param(weight)
        self.bias = self.build_param(bias)

    def forward(self, input):
        filter_shape = self.weight.tensor.shape
        dilated_shape = (
            (filter_shape[2] - 1) * self.dilation + 1, (filter_shape[3] - 1) * self.dilation + 1)
        res_shape = (
            (self.input_height - 1) * self.stride + dilated_shape[0],
            (self.input_width - 1) * self.stride + dilated_shape[1]
        )
        input_mat = input.reshape(
            (input.shape[0], input.shape[1], -1)).transpose((0, 2, 1))
        filters_mat = self.weight.tensor.reshape(
            self.input_channel, -1)
        res_mat = np.matmul(input_mat, filters_mat)

        return im2rows(res_mat, (input.shape[0], filter_shape[1], res_shape[0], res_shape[1]), filter_shape, self.dilation, (self.stride, self.stride), dilated_shape, self.padding, input.shape[2:])

    def backward(self, in_gradient):
        '''
        This function is not needed in computation, at least right now.
         '''
        return in_gradient