Loss function
Last updated
Last updated
If you ever want weigh different losses when training using multiple losses such that both of them have same effect, weigh them according to magnitude of the gradients they give and not according to the magnitude of the loss.
// Let's consider that's it's single data point i.e N = 1
logit = XW+b // c dimensional output where c is # of classes
ypred = e^logit_c/(\sum_c e^logit_c) // softmax to normalize logits
Loss = \sum_c y_c log(ypred_c) // y_c is ground truth label.
Now, here we will find the derivative of cross entropy loss with respect to