Let assume we have an input matrix X of dimensionality 3×3 and a 2×2 filter W given by the following matrices
X=[x00x01x02x10x11x12x20x21x22],and
W=[w00w01w10w11].We introduce now the hyperparameter S stride. Stride represents how the filter W moves the convolution process on the matrix X. We strongly recommend the repository on Arithmetic of deep learning by Dumoulin and Visin
Here we set the stride equal to S=1, which means that, starting with the element x00, the filter will act on 2×2 submatrices each time, starting with the upper corner and moving according to the stride value column by column.
Here we perform the operation
Y(i,j)=(X∗W)(i,j)=∑m∑nX(i−m,j−n)W(m,n),and obtain
Y=[x00w00+x01w01+x10w10+x11w11x01w00+x02w01+x11w10+x12w11x10w00+x11w01+x20w10+x21w11x11w00+x12w01+x21w10+x22w11].We can rewrite this operation in terms of a matrix-vector multiplication by defining a new vector where we flatten out the inputs as a vector X′ of length 9 and a matrix W′ with dimension 4×9 as
X′=[x00x01x02x10x11x12x20x21x22],and the new matrix
W′=[w00w010w10w1100000w00w010w10w11000000w00w010w10w1100000w00w010w10w11].We see easily that performing the matrix-vector multiplication W′X′ is the same as the above convolution with stride S=1, that is
Y=(W∗X),is now given by W′X′ which is a vector of length 4 instead of the originally resulting 2×2 output matrix.