Example 2

We define a scalar (our cost/loss functions are in general also scalars, just think of the mean squared error) as the result of some matrix vector multiplications

$$ \alpha = \boldsymbol{y}^T\boldsymbol{A}\boldsymbol{x}, $$

with \( \boldsymbol{y} \) a vector of length \( m \), \( \boldsymbol{A} \) an \( m\times n \) matrix and \( \boldsymbol{x} \) a vector of length \( n \). We assume also that \( \boldsymbol{A} \) does not depend on any of the two vectors. In order to find the derivative of \( \alpha \) with respect to the two vectors, we define an intermediate vector \( \boldsymbol{z} \). We define first \( \boldsymbol{z}^T=\boldsymbol{y}^T\boldsymbol{A} \), a vector of length \( n \). We have then, using the definition of the Jacobian,

$$ \alpha = \boldsymbol{z}^T\boldsymbol{x}, $$

which means that (using our previous example and keeping track of our definition of the derivative of a scalar) we have

$$ \frac{\partial \alpha}{\partial \boldsymbol{x}} = \frac{\partial \boldsymbol{z}^T\boldsymbol{x}}{\partial \boldsymbol{x}}=\boldsymbol{z}^T. $$

Note that the resulting vector elements are the same for \( \boldsymbol{z}^T \) and \( \boldsymbol{z} \), the only difference is that one is just the transpose of the other. We have the transposed here since we have used that the inner product of two vectors is a scalar.

Since \( \alpha \) is a scalar we have \( \alpha =\alpha^T=\boldsymbol{x}^T\boldsymbol{A}^T\boldsymbol{y} \). Defining now \( \boldsymbol{z}^T=\boldsymbol{x}^T\boldsymbol{A}^T \) we find that

$$ \frac{\partial \alpha}{\partial \boldsymbol{y}} = \boldsymbol{z}^T=\boldsymbol{x}^T\boldsymbol{A}^T. $$