Let assume we have an input matrix X of dimensionality 3\times 3 and a 2\times 2 filter W given by the following matrices
\boldsymbol{X}=\begin{bmatrix}x_{00} & x_{01} & x_{02} \\ x_{10} & x_{11} & x_{12} \\ x_{20} & x_{21} & x_{22} \end{bmatrix},and
\boldsymbol{W}=\begin{bmatrix}w_{00} & w_{01} \\ w_{10} & w_{11}\end{bmatrix}.We introduce now the hyperparameter S stride. Stride represents how the filter W moves the convolution process on the matrix X . We strongly recommend the repository on Arithmetic of deep learning by Dumoulin and Visin
Here we set the stride equal to S=1 , which means that, starting with the element x_{00} , the filter will act on 2\times 2 submatrices each time, starting with the upper corner and moving according to the stride value column by column.
Here we perform the operation
Y_(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i-m,j-n)W(m,n),and obtain
\boldsymbol{Y}=\begin{bmatrix}x_{00}w_{00}+x_{01}w_{01}+x_{10}w_{10}+x_{11}w_{11} & x_{01}w_{00}+x_{02}w_{01}+x_{11}w_{10}+x_{12}w_{11} \\ x_{10}w_{00}+x_{11}w_{01}+x_{20}w_{10}+x_{21}w_{11} & x_{11}w_{00}+x_{12}w_{01}+x_{21}w_{10}+x_{22}w_{11}\end{bmatrix}.We can rewrite this operation in terms of a matrix-vector multiplication by defining a new vector where we flatten out the inputs as a vector \boldsymbol{X}' of length 9 and a matrix \boldsymbol{W}' with dimension 4\times 9 as
\boldsymbol{X}'=\begin{bmatrix}x_{00} \\ x_{01} \\ x_{02} \\ x_{10} \\ x_{11} \\ x_{12} \\ x_{20} \\ x_{21} \\ x_{22} \end{bmatrix},and the new matrix
\boldsymbol{W}'=\begin{bmatrix} w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 & 0 \\ 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 \\ 0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 \\ 0 & 0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11}\end{bmatrix}.We see easily that performing the matrix-vector multiplication \boldsymbol{W}'\boldsymbol{X}' is the same as the above convolution with stride S=1 , that is
Y=(\boldsymbol{W}*\boldsymbol{X}),is now given by \boldsymbol{W}'\boldsymbol{X}' which is a vector of length 4 instead of the originally resulting 2\times 2 output matrix.