Processing math: 100%

 

 

 

CNNs in more detail, simple example

Let assume we have an input matrix X of dimensionality 3×3 and a 2×2 filter W given by the following matrices

X=[x00x01x02x10x11x12x20x21x22],

and

W=[w00w01w10w11].

We introduce now the hyperparameter S stride. Stride represents how the filter W moves the convolution process on the matrix X. We strongly recommend the repository on Arithmetic of deep learning by Dumoulin and Visin

Here we set the stride equal to S=1, which means that, starting with the element x00, the filter will act on 2×2 submatrices each time, starting with the upper corner and moving according to the stride value column by column.

Here we perform the operation

Y(i,j)=(XW)(i,j)=mnX(im,jn)W(m,n),

and obtain

Y=[x00w00+x01w01+x10w10+x11w11x01w00+x02w01+x11w10+x12w11x10w00+x11w01+x20w10+x21w11x11w00+x12w01+x21w10+x22w11].

We can rewrite this operation in terms of a matrix-vector multiplication by defining a new vector where we flatten out the inputs as a vector X of length 9 and a matrix W with dimension 4×9 as

X=[x00x01x02x10x11x12x20x21x22],

and the new matrix

W=[w00w010w10w1100000w00w010w10w11000000w00w010w10w1100000w00w010w10w11].

We see easily that performing the matrix-vector multiplication WX is the same as the above convolution with stride S=1, that is

Y=(WX),

is now given by WX which is a vector of length 4 instead of the originally resulting 2×2 output matrix.