# Cross Entropy Loss¶

The cross-entropy loss is defined as $$l=-\sum_i^n \hat{y_i}log(p(y_i))$$ where $$p(y_i)$$ is the probability of the output number, i.e. we usually use cross-entropy loss after a softmax layer. By this nature, we could actually compute the derivative of cross-entropy loss with respect to the original output $$y_i$$ rather than $$p(y_i)$$.

Then we have:

$\frac{\partial l}{\partial y_i}=- \sum_j \hat{y_j} \frac{\partial log(p(y_j))}{\partial y_i} = -\sum_j \hat{y_j} \frac{1}{p(y_j)}\frac{\partial p(y_j)}{\partial y_i}$

Then as we know there will be a $$k=i$$ such that $$\frac{p(y_k)}{\partial y_i}=p(y_j)(1-p(y_j))$$, and for other $$k\neq i$$, we have $$\frac{p(y_k)}{\partial y_i}=-p(y_j)p(y_i)$$.

Then we have:

$\begin{split}\begin{array}{l} -\sum_j \hat{y_j} \frac{1}{p(y_j)}\frac{\partial p(y_j)}{\partial y_i} \\ = (-y_i)(1-p(y_i))-\sum_{j\neq i} \hat{y_j} \frac{1}{p(y_j)}p(y_j)p(y_i) \\ = -y_i + p(y_i)y_i + \sum_{j\neq i}y_jp(y_i) \\ = -y_i + p(y_i)\sum_{j\neq i} y_j \\ = -y_i + p(y_i)\sum_{j}p(y_j) \\ = p(y_i) - y_i \end{array}\end{split}$

The form is very elegant, and easy to compute. Therefore we usually hide the computational process of the derivative of softmax in the computation of cross entropy loss.

The implementation of cross entropy loss in tinyml is as below:

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 from tinyml.core import Backend as np from tinyml.layers import Softmax def cross_entropy_with_softmax_loss(predicted, ground_truth): softmax = Softmax('softmax') output_probabilities = softmax(predicted) print(output_probabilities) loss = np.mean(-np.log(output_probabilities[ np.arange(output_probabilities.shape, dtype=np.int8), ground_truth] + 1e-20)) output_probabilities[np.arange(output_probabilities.shape), ground_truth] -= 1 gradient = output_probabilities / predicted.shape return loss, gradient