Loading [MathJax]/extensions/TeX/boldsymbol.js

 

 

 

CNNs in more detail, simple example

Let assume we have an input matrix X of dimensionality 3\times 3 and a 2\times 2 filter W given by the following matrices

\boldsymbol{X}=\begin{bmatrix}x_{00} & x_{01} & x_{02} \\ x_{10} & x_{11} & x_{12} \\ x_{20} & x_{21} & x_{22} \end{bmatrix},

and

\boldsymbol{W}=\begin{bmatrix}w_{00} & w_{01} \\ w_{10} & w_{11}\end{bmatrix}.

We introduce now the hyperparameter S stride. Stride represents how the filter W moves the convolution process on the matrix X . We strongly recommend the repository on Arithmetic of deep learning by Dumoulin and Visin

Here we set the stride equal to S=1 , which means that, starting with the element x_{00} , the filter will act on 2\times 2 submatrices each time, starting with the upper corner and moving according to the stride value column by column.

Here we perform the operation

Y_(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i-m,j-n)W(m,n),

and obtain

\boldsymbol{Y}=\begin{bmatrix}x_{00}w_{00}+x_{01}w_{01}+x_{10}w_{10}+x_{11}w_{11} & x_{01}w_{00}+x_{02}w_{01}+x_{11}w_{10}+x_{12}w_{11} \\ x_{10}w_{00}+x_{11}w_{01}+x_{20}w_{10}+x_{21}w_{11} & x_{11}w_{00}+x_{12}w_{01}+x_{21}w_{10}+x_{22}w_{11}\end{bmatrix}.

We can rewrite this operation in terms of a matrix-vector multiplication by defining a new vector where we flatten out the inputs as a vector \boldsymbol{X}' of length 9 and a matrix \boldsymbol{W}' with dimension 4\times 9 as

\boldsymbol{X}'=\begin{bmatrix}x_{00} \\ x_{01} \\ x_{02} \\ x_{10} \\ x_{11} \\ x_{12} \\ x_{20} \\ x_{21} \\ x_{22} \end{bmatrix},

and the new matrix

\boldsymbol{W}'=\begin{bmatrix} w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 & 0 \\ 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 \\ 0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 \\ 0 & 0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11}\end{bmatrix}.

We see easily that performing the matrix-vector multiplication \boldsymbol{W}'\boldsymbol{X}' is the same as the above convolution with stride S=1 , that is

Y=(\boldsymbol{W}*\boldsymbol{X}),

is now given by \boldsymbol{W}'\boldsymbol{X}' which is a vector of length 4 instead of the originally resulting 2\times 2 output matrix.