We define a scalar (our cost/loss functions are in general also scalars, just think of the mean squared error) as the result of some matrix vector multiplications
\alpha = \boldsymbol{y}^T\boldsymbol{A}\boldsymbol{x},with \boldsymbol{y} a vector of length m , \boldsymbol{A} an m\times n matrix and \boldsymbol{x} a vector of length n . We assume also that \boldsymbol{A} does not depend on any of the two vectors. In order to find the derivative of \alpha with respect to the two vectors, we define an intermediate vector \boldsymbol{z} . We define first \boldsymbol{z}^T=\boldsymbol{y}^T\boldsymbol{A} , a vector of length n . We have then, using the definition of the Jacobian,
\alpha = \boldsymbol{z}^T\boldsymbol{x},which means that (using our previous example and keeping track of our definition of the derivative of a scalar) we have
\frac{\partial \alpha}{\partial \boldsymbol{x}} = \frac{\partial \boldsymbol{z}^T\boldsymbol{x}}{\partial \boldsymbol{x}}=\boldsymbol{z}^T.Note that the resulting vector elements are the same for \boldsymbol{z}^T and \boldsymbol{z} , the only difference is that one is just the transpose of the other. We have the transposed here since we have used that the inner product of two vectors is a scalar.
Since \alpha is a scalar we have \alpha =\alpha^T=\boldsymbol{x}^T\boldsymbol{A}^T\boldsymbol{y} . Defining now \boldsymbol{z}^T=\boldsymbol{x}^T\boldsymbol{A}^T we find that
\frac{\partial \alpha}{\partial \boldsymbol{y}} = \boldsymbol{z}^T=\boldsymbol{x}^T\boldsymbol{A}^T.