The mathematics of CNNs is based on the mathematical operation of convolution. In mathematics (in particular in functional analysis), convolution is represented by mathematical operations (integration, summation etc) on two functions in order to produce a third function that expresses how the shape of one gets modified by the other. Convolution has a plethora of applications in a variety of disciplines, spanning from statistics to signal processing, computer vision, solutions of differential equations,linear algebra, engineering, and yes, machine learning.
Mathematically, convolution is defined as follows (one-dimensional example): Let us define a continuous function \( y(t) \) given by
$$ y(t) = \int x(a) w(t-a) da, $$where \( x(a) \) represents a so-called input and \( w(t-a) \) is normally called the weight function or kernel.
The above integral is written in a more compact form as
$$ y(t) = \left(x * w\right)(t). $$The discretized version reads
$$ y(t) = \sum_{a=-\infty}^{a=\infty}x(a)w(t-a). $$Computing the inverse of the above convolution operations is known as deconvolution and the process is commutative.
How can we use this? And what does it mean? Let us study some familiar examples first.