# Transposed Convolutional LayerΒΆ

**Definition** In *Section 3.1.2 Convolution*, we unrolled the filter
from a \(2\times 2\) matrix into a \(4\times 9\) matrix, so that
we can perform the convolution by matrix multiplication. After the
convolution operation, the input data changes from a \(3\times 3\)
matrix to a \(2\times 2\) matrix. The deconv operation is defined as
the inverse of the convolution operation, i.e. change the input data
from a \(2\times 2\) matrix into an output matrix with the shape
\(3\times 3\) in our example. The deconv operation does not
guarantee that we will have the same values in the output as the
original matrix. Below we will show how it is computed in the forward
pass.

**Forward Pass** When computing the forward pass of deconv operation, we
can simply transpose the unrolled filters matrix, for example, it will
be a \(4\times 9\) matrix in our case. After the transpose, we can
define the deconv operation as \(X=(W^*)^T Y\), i.e. we use the
transposed, and unrolled filter matrix to multiply the output of the
convolution operation.

We assume that we have an input \(Y\)( exactly the same with the output of the convolution operation in our previous example, hence we will use \(Y\) as the notation for this input) and the same filter \(W\) as

Then we want to get a \(3\times 3\) matrix as the output of the deconv operation. Recall that we unrolled the filter into the matrix as

We can compute the desired matrix by performing transpose on the filter matrix first, and then multiply it with our input. We will have

Then we can reshape it back into a \(3\times 3\) matrix as \(X_{3\times 3}=\left[ {\begin{array}{*{20}c}37 & 121 & 94 \\178 & 500 & 342 \\201 & 499 & 308 \end{array} } \right]\)

As we see in this example, the deconv operation does not guarantee that we will have the same input of convolution operation, but just guarantee we will have a matrix with the same shape as the input of convolution operation. Since the entries may exceed the maximum light intensity, i.e. \(255\), when we are visualizing the deconv result, we will need to renormalize every entry into the range of \([0,255]\).

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 | ```
from .base import Layer
from tinynet.core import Backend as np
def im2rows(input, inp_shape, filter_shape, dilation, stride, dilated_shape, padding, res_shape):
"""
Gradient transformation for the im2rows operation
:param in_gradient: The grad from the next layer
:param inp_shape: The shape of the input image
:param filter_shape: The shape of the filter (num_filters, depth, height, width)
:param dilation: The dilation for the filter
:param stride: The stride for the filter
:param dilated_shape: The dilated shape of the filter
:param res_shape: The shape of the expected result
:return: The reformed gradient of the shape of the image
"""
dilated_rows, dilated_cols = dilated_shape
num_rows, num_cols = res_shape
res = np.zeros(inp_shape, dtype=input.dtype)
input = input.reshape(
(input.shape[0], input.shape[1], filter_shape[1], filter_shape[2], filter_shape[3]))
for it in range(num_rows * num_cols):
# first found index of rows and columns
# i for rows
# j for columns
i = it // num_rows
j = it % num_rows
# accessing via colons: [start:end:step]
# commas are for different dimensions
res[:, :, i * stride[0]:i * stride[0] + dilated_rows:dilation,
j * stride[1]:j * stride[1] + dilated_cols:dilation] += input[:, it, :, :, :]
if (padding != 0):
# TODO: this only works for pad=1, right now.
# remove the padding regions
res = np.delete(res, 0, 2)
res = np.delete(res, res.shape[2]-1, 2)
res = np.delete(res, 0, 3)
res = np.delete(res, res.shape[3]-1,3)
return res
class Deconv2D(Layer):
'''
Deconv2D performs deconvolution operation, or tranposed convolution.
'''
def __init__(self, name, input_dim, n_filters, h_filter, w_filter, stride, dilation=1, padding=0):
'''
:param input_dim: the input dimension, in the format of (C,H,W)
:param n_filters: the number of convolution filters
:param h_filter: the height of the filter
:param w_filter: the width of the filter
:param stride: the stride for forward convolution
:param dilation: the dilation factor for the filters, =1 by default.
'''
super().__init__(name)
self.type = 'Deconv2D'
self.input_channel, self.input_height, self.input_width = input_dim
self.n_filters = n_filters
self.h_filter = h_filter
self.w_filter = w_filter
self.stride = stride
self.dilation = dilation
self.padding = padding
weight = np.random.randn(
self.n_filters, self.input_channel, self.h_filter, self.w_filter) / np.sqrt(self.n_filters/2.0)
bias = np.zeros((self.n_filters, 1))
self.weight = self.build_param(weight)
self.bias = self.build_param(bias)
def forward(self, input):
filter_shape = self.weight.tensor.shape
dilated_shape = (
(filter_shape[2] - 1) * self.dilation + 1, (filter_shape[3] - 1) * self.dilation + 1)
res_shape = (
(self.input_height - 1) * self.stride + dilated_shape[0],
(self.input_width - 1) * self.stride + dilated_shape[1]
)
input_mat = input.reshape(
(input.shape[0], input.shape[1], -1)).transpose((0, 2, 1))
filters_mat = self.weight.tensor.reshape(
self.input_channel, -1)
res_mat = np.matmul(input_mat, filters_mat)
return im2rows(res_mat, (input.shape[0], filter_shape[1], res_shape[0], res_shape[1]), filter_shape, self.dilation, (self.stride, self.stride), dilated_shape, self.padding, input.shape[2:])
def backward(self, in_gradient):
'''
This function is not needed in computation, at least right now.
'''
return in_gradient
``` |