Week 44, Convolutional Neural Networks (CNN)

Loading [MathJax]/extensions/TeX/boldsymbol.js

CNNs in more detail, simple example

Let assume we have an input matrix $X$ of dimensionality $3\times 3$ and a $2\times 2$ filter $W$ given by the following matrices

$\boldsymbol{X}=\begin{bmatrix}x_{00} & x_{01} & x_{02} \\ x_{10} & x_{11} & x_{12} \\ x_{20} & x_{21} & x_{22} \end{bmatrix},$

and

$\boldsymbol{W}=\begin{bmatrix}w_{00} & w_{01} \\ w_{10} & w_{11}\end{bmatrix}.$

We introduce now the hyperparameter $S$ stride. Stride represents how the filter $W$ moves the convolution process on the matrix $X$ . We strongly recommend the repository on Arithmetic of deep learning by Dumoulin and Visin

Here we set the stride equal to $S=1$ , which means that, starting with the element $x_{00}$ , the filter will act on $2\times 2$ submatrices each time, starting with the upper corner and moving according to the stride value column by column.

Here we perform the operation

$Y_(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i-m,j-n)W(m,n),$

and obtain

$\boldsymbol{Y}=\begin{bmatrix}x_{00}w_{00}+x_{01}w_{01}+x_{10}w_{10}+x_{11}w_{11} & x_{01}w_{00}+x_{02}w_{01}+x_{11}w_{10}+x_{12}w_{11} \\ x_{10}w_{00}+x_{11}w_{01}+x_{20}w_{10}+x_{21}w_{11} & x_{11}w_{00}+x_{12}w_{01}+x_{21}w_{10}+x_{22}w_{11}\end{bmatrix}.$

We can rewrite this operation in terms of a matrix-vector multiplication by defining a new vector where we flatten out the inputs as a vector $\boldsymbol{X}'$ of length $9$ and a matrix $\boldsymbol{W}'$ with dimension $4\times 9$ as

$\boldsymbol{X}'=\begin{bmatrix}x_{00} \\ x_{01} \\ x_{02} \\ x_{10} \\ x_{11} \\ x_{12} \\ x_{20} \\ x_{21} \\ x_{22} \end{bmatrix},$

and the new matrix

$\boldsymbol{W}'=\begin{bmatrix} w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 & 0 \\ 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 \\ 0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 \\ 0 & 0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11}\end{bmatrix}.$

We see easily that performing the matrix-vector multiplication $\boldsymbol{W}'\boldsymbol{X}'$ is the same as the above convolution with stride $S=1$ , that is

$Y=(\boldsymbol{W}*\boldsymbol{X}),$

is now given by $\boldsymbol{W}'\boldsymbol{X}'$ which is a vector of length $4$ instead of the originally resulting $2\times 2$ output matrix.