The following matrix and vector relation will be useful here and for the rest of the course. Vectors are always written as boldfaced lower case letters and matrices as upper case boldfaced letters. In the following we will discuss how to calculate derivatives of various matrices relevant for machine learning. We will often represent our data in terms of matrices and vectors.
Let us introduce first some conventions. We assume that \( \boldsymbol{y} \) is a vector of length \( m \), that is it has \( m \) elements \( y_0,y_1,\dots, y_{m-1} \). By convention we start labeling vectors with the zeroth element, as are arrays in Python and C++/C, for example. Similarly, we have a vector \( \boldsymbol{x} \) of length \( n \), that is \( \boldsymbol{x}^T=[x_0,x_1,\dots, x_{n-1}] \).
We assume also that \( \boldsymbol{y} \) is a function of \( \boldsymbol{x} \) through some given function \( f \)
$$ \boldsymbol{y}=f(\boldsymbol{x}). $$