Let us assume we have an input volume V given by an image of dimensionality 32\times 32 \times 3 , that is three color channels and 32\times 32 pixels.
We apply a filter of dimension 5\times 5 ten times with stride S=1 and padding P=0 .
The output volume is given by (32-5)/1+1=28 , resulting in ten images of dimensionality 28\times 28\times 3 .
The total number of parameters to train for each filter is then 5\times 5\times 3+1 , where the last parameter is the bias. This gives us 76 parameters for each filter, leading to a total of 760 parameters for the ten filters.
How many parameters will a filter of dimensionality 3\times 3 (adding color channels) result in if we produce 32 new images? Use S=1 and P=0 .
Note that strides constitute a form of subsampling. As an alternative to being interpreted as a measure of how much the kernel/filter is translated, strides can also be viewed as how much of the output is retained. For instance, moving the kernel by hops of two is equivalent to moving the kernel by hops of one but retaining only odd output elements.