Cross Entropy LossΒΆ

The cross-entropy loss is defined as \(l=-\sum_i^n \hat{y_i}log(p(y_i))\) where \(p(y_i)\) is the probability of the output number, i.e. we usually use cross-entropy loss after a softmax layer. By this nature, we could actually compute the derivative of cross-entropy loss with respect to the original output \(y_i\) rather than \(p(y_i)\).

Then we have:

\[\frac{\partial l}{\partial y_i}=- \sum_j \hat{y_j} \frac{\partial log(p(y_j))}{\partial y_i} = -\sum_j \hat{y_j} \frac{1}{p(y_j)}\frac{\partial p(y_j)}{\partial y_i}\]

Then as we know there will be a \(k=i\) such that \(\frac{p(y_k)}{\partial y_i}=p(y_j)(1-p(y_j))\), and for other \(k\neq i\), we have \(\frac{p(y_k)}{\partial y_i}=-p(y_j)p(y_i)\).

Then we have:

\[\begin{split}\begin{array}{l} -\sum_j \hat{y_j} \frac{1}{p(y_j)}\frac{\partial p(y_j)}{\partial y_i} \\ = (-y_i)(1-p(y_i))-\sum_{j\neq i} \hat{y_j} \frac{1}{p(y_j)}p(y_j)p(y_i) \\ = -y_i + p(y_i)y_i + \sum_{j\neq i}y_jp(y_i) \\ = -y_i + p(y_i)\sum_{j\neq i} y_j \\ = -y_i + p(y_i)\sum_{j}p(y_j) \\ = p(y_i) - y_i \end{array}\end{split}\]

The form is very elegant, and easy to compute. Therefore we usually hide the computational process of the derivative of softmax in the computation of cross entropy loss.

The implementation of cross entropy loss in Tinynet is as below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
from tinynet.core import Backend as np
from tinynet.layers import Softmax

def cross_entropy_with_softmax_loss(predicted, ground_truth):
    softmax = Softmax('softmax')
    output_probabilities = softmax(predicted)
    print(output_probabilities)
    loss = np.mean(-np.log(output_probabilities[np.arange(output_probabilities.shape[0],dtype=np.int8), ground_truth]+1e-20))
    
    output_probabilities[np.arange(output_probabilities.shape[0]), ground_truth] -= 1
    gradient = output_probabilities / predicted.shape[0]
    return loss, gradient