Matrix Calculus

Gradients, Jacobians, etc in Matrix algebra

Gradient

f:Rd→Rf: \mathcal{R}^d \rightarrow \mathcal{R} where ff is some function.

y^=f(w)\hat y = f(\mathbf w)

Here w∈Rd\mathbf w \in \mathcal{R}^d, now for some reason we have want the derivative of ff wrt to each elements of w\mathbf w, that will be called gradient. Represented as follows:

∇wf(w)=[∂f∂w1,…,∂f∂wd]T\nabla_\mathbf w f(\mathbf w) = [\frac{\partial f}{\partial w_1}, \dots, \frac{\partial f}{\partial w_d}]^T

The grad\text{grad} ∇wf(w)\nabla_\mathbf w f(\mathbf w)is supposed to be also a horizontal vector of same dimension as w\mathbf w.

  • Note that the f(w)f(\mathbf w) is scalar values function but ∇wf(w)\nabla_\mathbf w f(\mathbf w) is actually a vector values function.

  • The gradients are perpendicular to the contour lines of the curve ff.

  • The gradient of ffpoints in the direction of the steepest ascent. Why? Think using directional derivatives.

Gradient in Matrix, Vector forms

Let's say that f(w)=wâ‹…x=wTx=xTwf( \mathbf w) = \mathbf w\cdot \mathbf x = \mathbf w^T\mathbf x = \mathbf x^T \mathbf w which is simply a linear function. Here x is some input vector of dimension same as . Then we have

∇wf(w)=∇w(wTx)=x\nabla_{\mathbf w} f(\mathbf w) = \nabla_\mathbf w (\mathbf w^T \mathbf x) = \mathbf x

Jacobian

Now let f:Rn→Rmf: \mathcal{R}^n \rightarrow \mathcal{R}^m. In this case we would have a jacobian JJ

J=[∂f∂x1…∂f∂xn]=[∇Tf1…∇Tfm]=[∂f1∂x1…∂f1∂xn⋮…⋮∂fm∂x1…∂fm∂xn]J = \begin{bmatrix} \frac{\partial \mathbf f}{\partial x_1} & \dots & \frac{\partial \mathbf f}{\partial x_n}\\ \end{bmatrix} \\ = \begin{bmatrix} \nabla^T f_1\\ \dots\\ \nabla^T f_m\\ \end{bmatrix} \\ = \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \dots & \frac{\partial f_1}{\partial x_n}\\ \vdots & \dots & \vdots\\ \frac{\partial f_m}{\partial x_1} & \dots & \frac{\partial f_m}{\partial x_n}\\ \end{bmatrix}

Note: Jacobian and Gradient are transpose of each other. So, if you try to calculate derivative of single value function with respect to a vector using Jacobian, it would be a row vector. So to get a gradient you have transpose it.

Chain Rule

Vector

Let f,g\mathbf{f, g} be two vector valued function and xx be a scalar. Then

∇xf=∂f(g(x))∂x=∂f∂g∂g∂x\nabla_x \mathbf f = \frac{\partial \mathbf {f(g(}x))}{\partial x} = \frac{\partial \mathbf f}{\partial \mathbf g}\frac{\partial \mathbf g}{\partial x}

Now if there are multiple paramteres i.e. it's a vector x\mathbf x. Then it's

∇xf=∂f(g(x))∂x=∂f∂g∂g∂x\nabla_{\mathbf x} \mathbf f = \frac{\partial \mathbf {f(g(x))}}{\partial \mathbf x} = \frac{\partial \mathbf f}{\partial \mathbf g}\frac{\partial \mathbf g}{\partial \mathbf x}

Resources

Last updated