CNNs in brief
In summary:
- A CNN architecture is in the simplest case a list of Layers that transform the image volume into an output volume (e.g. holding the class scores)
- There are a few distinct types of Layers (e.g. CONV/FC/RELU/POOL are by far the most popular)
- Each Layer accepts an input 3D volume and transforms it to an output 3D volume through a differentiable function
- Each Layer may or may not have parameters (e.g. CONV/FC do, RELU/POOL don’t)
- Each Layer may or may not have additional hyperparameters (e.g. CONV/FC/POOL do, RELU doesn’t)
For more material on convolutional networks, we strongly recommend
the course
IN5400 – Machine Learning for Image Analysis
and the slides of
CS231 which is taught at Stanford University (consistently ranked as one of the top computer science programs in the world).
Michael Nielsen's book is a must read, in particular chapter 6 which deals with CNNs.
However, both standard feed forwards networks and CNNs perform well on data with unknown length.
This is where recurrent nueral networks (RNNs) come to our rescue.