class SmallConvNet(nn.Module):
def __init__(self):
super().__init__()
self.pipeline = torch.nn.Sequential(
nn.Conv2d(d_channels, m_kernels, kernel_size),
ReLU(),
nn.Flatten(),
nn.Linear(input_dimension, output_dimension)
)
def forward(self, x):
return self.pipeline(x)Checking Dimensions in Deep Learning Models
Part A
Let \(\mathbf{x}\in \mathbb{R}^p\) be a vector with \(p\) input features (we’ll assume that the constant feature is included). We can represent a simple neural network with \(k\) outputs as a function \(f: \mathbb{R}^p \to \mathbb{R}^k\) that maps the input vector \(\mathbf{x}\) to an output vector \(\mathbf{y}= f(\mathbf{x})\) using the following formula:
\[ \begin{aligned} f(\mathbf{x}) = \alpha(\alpha(\mathbf{x}^T \mathbf{W}_1) \mathbf{W}_2)\mathbf{W}_3\;, \end{aligned} \]
where \(\alpha\) is a nonlinear function (like the ReLU) which is applied entrywise to its argument.
Each of the three matrices \(\mathbf{W}_i\) has dimensions \(r_i \times c_i\) for some positive integers \(r_i\) and \(c_i\). Please give an example of choices of \(r_i\) and \(c_i\) for \(i=1,2,3\) that would work for the above formula, in the sense that all matrix-vector operations are valid and that the input and output have correct dimensions.
Part B
Applying a convolutional kernel of size \(k \times k\) to an image with \(r\) rows, \(c\) columns, and \(d\) channels (e.g. RGB) results in an output with \(r - k + 1\) rows, \(c - k + 1\) columns, and \(1\) channel.
Consider the following simple convolutional neural network:
Here, d_channels is the number of channels in the input image (corresponding to \(d\) above), m_kernels is the number of convolutional kernels, kernel_size is the size of the convolutional kernel (corresponding to \(k\) above), input_dimension is the dimension of the flattened output of the convolutional layer, and output_dimension is the dimension of the output of the network.
Please give a formula for the required value of input_dimension in terms of d_channels, m_kernels, kernel_size, and the dimensions of the input image (i.e. \(r\) and \(c\)).
Hint: You may find it helpful to check your proposed formula against an example in the lecture notes.