Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression

Loading [MathJax]/extensions/TeX/boldsymbol.js

Example 2

We define a scalar (our cost/loss functions are in general also scalars, just think of the mean squared error) as the result of some matrix vector multiplications

$\alpha = \boldsymbol{y}^T\boldsymbol{A}\boldsymbol{x},$

with $\boldsymbol{y}$ a vector of length $m$ , $\boldsymbol{A}$ an $m\times n$ matrix and $\boldsymbol{x}$ a vector of length $n$ . We assume also that $\boldsymbol{A}$ does not depend on any of the two vectors. In order to find the derivative of $\alpha$ with respect to the two vectors, we define an intermediate vector $\boldsymbol{z}$ . We define first $\boldsymbol{z}^T=\boldsymbol{y}^T\boldsymbol{A}$ , a vector of length $n$ . We have then, using the definition of the Jacobian,

$\alpha = \boldsymbol{z}^T\boldsymbol{x},$

which means that (using our previous example and keeping track of our definition of the derivative of a scalar) we have

$\frac{\partial \alpha}{\partial \boldsymbol{x}} = \frac{\partial \boldsymbol{z}^T\boldsymbol{x}}{\partial \boldsymbol{x}}=\boldsymbol{z}^T.$

Note that the resulting vector elements are the same for $\boldsymbol{z}^T$ and $\boldsymbol{z}$ , the only difference is that one is just the transpose of the other. We have the transposed here since we have used that the inner product of two vectors is a scalar.

Since $\alpha$ is a scalar we have $\alpha =\alpha^T=\boldsymbol{x}^T\boldsymbol{A}^T\boldsymbol{y}$ . Defining now $\boldsymbol{z}^T=\boldsymbol{x}^T\boldsymbol{A}^T$ we find that

$\frac{\partial \alpha}{\partial \boldsymbol{y}} = \boldsymbol{z}^T=\boldsymbol{x}^T\boldsymbol{A}^T.$