Mathematics of CNNs

The mathematics of CNNs is based on the mathematical operation of convolution. In mathematics (in particular in functional analysis), convolution is represented by mathematical operation (integration, summation etc) on two function in order to produce a third function that expresses how the shape of one gets modified by the other. Convolution has a plethora of applications in a variety of disciplines, spanning from statistics to signal processing, computer vision, solutions of differential equations,linear algebra, engineering, and yes, machine learning.

Mathematically, convolution is defined as follows (one-dimensional example): Let us define a continuous function \( y(t) \) given by

$$ y(t) = \int x(a) w(t-a) da, $$

where \( x(a) \) represents a so-called input and \( w(t-a) \) is normally called the weight function or kernel.

The above integral is written in a more compact form as

$$ y(t) = \left(x * w\right)(t). $$

The discretized version reads

$$ y(t) = \sum_{a=-\infty}^{a=\infty}x(a)w(t-a). $$

Computing the inverse of the above convolution operations is known as deconvolution.

How can we use this? And what does it mean? Let us study some familiar examples first.