Let us assume we have an input volume \( V \) given by an image of dimensionality \( 32\times 32 \times 3 \), that is three color channels and \( 32\times 32 \) pixels.
We apply a filter of dimension \( 5\times 5 \) ten times with stride \( S=1 \) and padding \( P=0 \).
The output volume is given by \( (32-5)/1+1=28 \), resulting in ten images of dimensionality \( 28\times 28\times 3 \).
The total number of parameters to train for each filter is then \( 5\times 5\times 3+1 \), where the last parameter is the bias. This gives us \( 76 \) parameters for each filter, leading to a total of \( 760 \) parameters for the ten filters.
How many parameters will a filter of dimensionality \( 3\times 3 \) (adding color channels) result in if we produce \( 32 \) new images? Use \( S=1 \) and \( P=0 \).
Note that strides constitute a form of subsampling. As an alternative to being interpreted as a measure of how much the kernel/filter is translated, strides can also be viewed as how much of the output is retained. For instance, moving the kernel by hops of two is equivalent to moving the kernel by hops of one but retaining only odd output elements.