Loss function

If you ever want weigh different losses when training using multiple losses such that both of them have same effect, weigh them according to magnitude of the gradients they give and not according to the magnitude of the loss.

Cross Entropy Loss

\mathcal{L} = -\frac{1}{N}\sum_i^N \sum_c^C y_c^i \log \hat{y}^i_c

Gradient Calculation

// Let's consider that's it's single data point i.e N = 1
logit = XW+b // c dimensional output where c is # of classes
ypred = e^logit_c/(\sum_c e^logit_c) // softmax to normalize logits
Loss = \sum_c y_c log(ypred_c) // y_c is ground truth label.

z = Xw+b\\ \hat y = \text{softmax}(z)

Now, here we will find the derivative of cross entropy loss with respect to $z$

\frac{\partial \mathcal L}{\partial z} = \hat y - y

Derivative of the Softmax Function and the Categorical Cross-Entropy LossMedium

PreviousOptimizers NextActivation Functions

Last updated 1 year ago