Example 2

We define a scalar (our cost/loss functions are in general also scalars, just think of the mean squared error) as the result of some matrix vector multiplications

$$ \alpha = \boldsymbol{y}^T\boldsymbol{A}\boldsymbol{x}, $$

with \( \boldsymbol{y} \) a vector of length \( m \), \( \boldsymbol{A} \) an \( m\times n \) matrix and \( \boldsymbol{x} \) a vector of length \( n \). We assume also that \( \boldsymbol{A} \) does not depend on any of the two vectors. In order to find the derivative of \( \alpha \) with respect to the two vectors, we define an intermediate vector \( \boldsymbol{z} \). We define first \( \boldsymbol{z}^T=\boldsymbol{y}^T\boldsymbol{A} \), a vector of length \( n \). We have then, using the definition of the Jacobian,

$$ \alpha = \boldsymbol{z}^T\boldsymbol{x}, $$

which means that (using our previous example) we have

$$ \frac{\partial \alpha}{\partial \boldsymbol{x}} = \boldsymbol{z}=bm{A}^T\boldsymbol{y}. $$

Note that the resulting vector elements are the same for \( \boldsymbol{z}^T \) and \( \boldsymbol{z} \), the only difference is that one if just the transpose of the other.

Since \( \alpha \) is a scalar we have \( \alpha =\alpha^T=\boldsymbol{x}^T\boldsymbol{A}^T\boldsymbol{y} \). Defining now \( \boldsymbol{z}=\boldsymbol{x}^T\boldsymbol{A}^T \) we find that

$$ \frac{\partial \alpha}{\partial \boldsymbol{y}} = \boldsymbol{z}^T=\boldsymbol{x}^T\boldsymbol{A}^T. $$