4525:
42:
5250:
6496:
5037:
3226:
2110:
7134:
6695:
4535:
Logistic regression typically optimizes the log loss for all the observations on which it is trained, which is the same as optimizing the average cross-entropy in the sample. Other loss functions that penalize errors differently can be also used for training, resulting in models with different final
5921:
6088:
6092:
5445:
6926:
4862:
4094:
3007:
4985:
4528:
Plot shows different loss functions that can be used to train a binary classifier. Only the case where the target output is 1 is shown. It is observed that the loss is zero when the target is equal to the output and increases as the output becomes increasingly
1797:
6930:
5704:
6500:
3706:. Mao, Mohri, and Zhong (2023) give an extensive analysis of the properties of the family of cross-entropy loss functions in machine learning, including theoretical learning guarantees and extensions to adversarial learning. The true probability
5245:{\displaystyle X^{\mathsf {T}}={\begin{pmatrix}1&x_{11}&\dots &x_{1p}\\1&x_{21}&\cdots &x_{2p}\\\vdots &\vdots &&\vdots \\1&x_{n1}&\cdots &x_{np}\\\end{pmatrix}}\in \mathbb {R} ^{n\times (p+1)},}
4518:
5749:
5925:
7610:
Noel, Mathew; Banerjee, Arindam; D, Geraldine Bessie Amali; Muthiah-Nakarajan, Venkataraman (March 17, 2023). "Alternate loss functions for classification and robust regression can improve the accuracy of artificial neural networks".
1227:
2203:
is the distribution of words as predicted by the model. Since the true distribution is unknown, cross-entropy cannot be directly calculated. In these cases, an estimate of cross-entropy is calculated using the following formula:
3315:
7280:
7657:
Shoham, Ron; Permuter, Haim H. (2019). "Amended Cross-Entropy Cost: An
Approach for Encouraging Diversity in Classification Ensemble (Brief Announcement)". In Dolev, Shlomi; Hendler, Danny; Lodha, Sachin; Yung, Moti (eds.).
5254:
6699:
7589:, Jason Brownlee, 2019, p. 220: "Logistic loss refers to the loss function commonly used to optimize a logistic regression model. It may also be referred to as logarithmic loss (which is confusing) or simply log loss."
1328:
4603:
2308:
3982:
906:
5612:
1581:
6491:{\displaystyle {\begin{aligned}{\frac {\partial }{\partial \beta _{0}}}L({\boldsymbol {\beta }})&=-\sum _{i=1}^{N}\left\\&=-\sum _{i=1}^{N}\left=\sum _{i=1}^{N}({\hat {y}}_{i}-y_{i}),\end{aligned}}}
2998:
679:
5621:
2896:
6097:
4311:
4869:
3614:
507:
392:
4356:
1633:
1510:
4203:
3221:{\displaystyle {\mathcal {L}}(\theta ;{\mathbf {x} })=\prod _{i}q_{\theta }(X=x_{i})=\prod _{x_{i}}q_{\theta }(X=x_{i})^{\#x_{i}}=PP^{-N}={\mathrm {e} }^{-N\cdot H(p,q_{\theta })}}
2677:
2512:
The section is concerned with the subject of estimation of the probability of different possible discrete outcomes. To this end, denote a parametrized family of distributions by
7146:
It may be beneficial to train an ensemble of models that have diversity, such that when they are combined, their predictive accuracy is augmented. Assuming a simple ensemble of
1123:
3911:
2105:{\displaystyle \operatorname {E} _{p}=-\operatorname {E} _{p}\left=-\operatorname {E} _{p}\left=-\sum _{x_{i}}p(x_{i})\,\log _{2}q(x_{i})=-\sum _{x}p(x)\,\log _{2}q(x)=H(p,q).}
3482:
5742:
4249:
808:
7129:{\displaystyle {\frac {\partial }{\partial \beta _{1}}}L({\boldsymbol {\beta }})=-\sum _{i=1}^{N}x_{i1}(y_{i}-{\hat {y}}_{i})=\sum _{i=1}^{N}x_{i1}({\hat {y}}_{i}-y_{i}).}
4592:
4116:
3231:
2537:
7486:
7460:
7169:
2834:
2792:
1660:
7434:
2415:
2754:
2628:
2557:
7394:
7337:
3688:
3649:
1371:
7364:
7307:
3758:
3731:
2704:
2604:
2442:
1687:
1458:
430:
5014:
2495:
2359:
710:
4146:
3977:
3431:
6690:{\displaystyle {\frac {\partial }{\partial \beta _{1}}}\ln {\frac {1}{1+e^{-\beta _{1}x_{i1}+k_{1}}}}={\frac {x_{i1}e^{k_{1}}}{e^{\beta _{1}x_{i1}}+e^{k_{1}}}},}
1770:
7414:
7164:
4554:
4351:
4331:
3951:
3931:
3839:
3819:
3799:
3563:
3543:
3523:
3451:
3401:
3377:
3357:
2724:
2577:
2462:
2379:
2330:
2201:
2181:
2161:
2137:
1790:
1747:
1727:
1707:
1415:
1395:
1116:
1096:
1076:
1056:
1036:
1005:
985:
958:
938:
781:
761:
734:
591:
571:
547:
527:
454:
318:
298:
270:
250:
226:
206:
164:
2209:
5916:{\displaystyle {\frac {\partial }{\partial \beta _{0}}}\ln {\frac {1}{1+e^{-\beta _{0}+k_{0}}}}={\frac {e^{-\beta _{0}+k_{0}}}{1+e^{-\beta _{0}+k_{0}}}},}
3569:
equivalent. This has led to some ambiguity in the literature, with some authors attempting to resolve the inconsistency by restating cross-entropy to be
6083:{\displaystyle {\frac {\partial }{\partial \beta _{0}}}\ln \left(1-{\frac {1}{1+e^{-\beta _{0}+k_{0}}}}\right)={\frac {-1}{1+e^{-\beta _{0}+k_{0}}}},}
5449:
598:
1237:
7660:
Cyber
Security Cryptography and Machine Learning ā Third International Symposium, CSCML 2019, Beer-Sheva, Israel, June 27ā28, 2019, Proceedings
232:
needed to identify an event drawn from the set when the coding scheme used for the set is optimized for an estimated probability distribution
7675:
7575:
7550:
I. J. Good, Maximum entropy for hypothesis formulation, especially for multidimensional contingency tables, Ann. of Math. Statistics, 1963
5440:{\displaystyle {\hat {y_{i}}}={\hat {f}}(x_{i1},\dots ,x_{ip})={\frac {1}{1+\exp(-\beta _{0}-\beta _{1}x_{i1}-\dots -\beta _{p}x_{ip})}},}
2163:, and then its cross-entropy is measured on a test set to assess how accurate the model is in predicting the test data. In this example,
816:
6921:{\displaystyle {\frac {\partial }{\partial \beta _{1}}}\ln \left={\frac {-x_{i1}e^{\beta _{1}x_{i1}}}{e^{\beta _{1}x_{i1}}+e^{k_{1}}}},}
96:
7641:
1515:
123:
325:
107:
3339:
Cross-entropy minimization is frequently used in optimization and rare-event probability estimation. When comparing a distribution
4524:
2905:
7736:
157:
3501:
3485:
3380:
2836:, as it may be understood as empirical approximation to the probability distribution underlying the scenario. Further denote by
2679:. Repeated occurrences are possible, leading to equal factors in the product. If the count of occurrences of the value equal to
460:
4857:{\displaystyle J(\mathbf {w} )\ =\ {\frac {1}{N}}\sum _{n=1}^{N}H(p_{n},q_{n})\ =\ -{\frac {1}{N}}\sum _{n=1}^{N}\ {\bigg }\,,}
4089:{\displaystyle q_{y=1}={\hat {y}}\equiv g(\mathbf {w} \cdot \mathbf {x} )={\frac {1}{1+e^{-\mathbf {w} \cdot \mathbf {x} }}},}
3001:
81:
7512:
4151:
2839:
7517:
3322:
5028:
The gradient of the cross-entropy loss for logistic regression is the same as the gradient of the squared-error loss for
4980:{\displaystyle {\hat {y}}_{n}\equiv g(\mathbf {w} \cdot \mathbf {x} _{n})=1/(1+e^{-\mathbf {w} \cdot \mathbf {x} _{n}})}
1689:
in bits. Therefore, cross-entropy can be interpreted as the expected message-length per datum when a wrong distribution
133:
51:
138:
7559:
Anqi Mao, Mehryar Mohri, Yutao Zhong. Cross-entropy loss functions: Theoretical analysis and applications. ICML 2023.
4254:
150:
7436:
is a parameter between 0 and 1 that defines the 'diversity' that we would like to establish among the ensemble. When
3572:
465:
112:
917:
7691:
Shoham, Ron; Permuter, Haim (2020). "Amended Cross
Entropy Cost: Framework For Explicit Diversity Encouragement".
7598:
5699:{\displaystyle {\frac {\partial }{\partial {\boldsymbol {\beta }}}}L({\boldsymbol {\beta }})=X^{T}({\hat {Y}}-Y).}
2607:
740:
186:
1586:
1463:
7741:
2633:
3484:
for cross-entropy. In the engineering literature, the principle of minimizing KL divergence (Kullback's "
965:
961:
784:
3851:
1430:
7497:
3841:, can be interpreted as a probability, which serves as the basis for classifying the observation. In
3456:
3404:
3334:
2505:
The cross entropy arises in classification problems when introducing a logarithm in the guise of the
76:
56:
5711:
4210:
789:
7507:
7502:
4513:{\displaystyle H(p,q)\ =\ -\sum _{i}p_{i}\log q_{i}\ =\ -y\log {\hat {y}}-(1-y)\log(1-{\hat {y}}).}
3842:
2506:
713:
61:
4559:
4099:
3781:
model which can be used to classify observations into two possible classes (often simply labelled
7692:
7612:
7522:
3318:
2515:
2465:
1433:
establishes that any directly decodable coding scheme for coding a message to identify one value
1426:
178:
71:
33:
7465:
7439:
2797:
7166:
classifiers is assembled via averaging the outputs, then the amended cross-entropy is given by
2759:
1638:
7716:
7671:
7637:
7419:
5029:
3846:
3778:
2384:
2729:
2613:
2542:
7663:
7369:
7312:
4119:
3699:
3658:
3652:
3619:
1341:
1008:
128:
86:
7342:
7285:
3736:
3709:
2682:
2582:
2420:
2119:
There are many situations where cross-entropy needs to be measured but the distribution of
1665:
1436:
1222:{\displaystyle -\int _{\mathcal {X}}P(x)\,\log Q(x)\,\mathrm {d} x=\operatorname {E} _{p},}
399:
4990:
2471:
2335:
686:
7541:
Thomas M. Cover, Joy A. Thomas, Elements of
Information Theory, 2nd Edition, Wiley, p. 80
4125:
3956:
3410:
1752:
7399:
7149:
5020:
The logistic loss is sometimes called cross-entropy loss. It is also known as log loss.
4539:
4336:
4316:
3936:
3916:
3824:
3804:
3784:
3548:
3528:
3508:
3436:
3386:
3362:
3342:
2709:
2562:
2447:
2364:
2315:
2186:
2166:
2146:
2140:
2122:
1775:
1732:
1712:
1692:
1400:
1380:
1101:
1081:
1061:
1041:
1021:
990:
970:
943:
923:
766:
746:
719:
576:
556:
532:
512:
439:
433:
303:
283:
255:
235:
211:
191:
7730:
3821:). The output of the model for a given observation, given a vector of input features
3770:
3310:{\displaystyle \log {\mathcal {L}}(\theta ;{\mathbf {x} })=-N\cdot H(p,q_{\theta }).}
1374:
1015:
66:
41:
7275:{\displaystyle e^{k}=H(p,q^{k})-{\frac {\lambda }{K}}\sum _{j\neq k}H(q^{j},q^{k})}
3703:
91:
7662:. Lecture Notes in Computer Science. Vol. 11527. Springer. pp. 202ā207.
17:
7667:
7587:
Probability for
Machine Learning: Discover How To Harness Uncertainty With Python
3565:
as possible, subject to some constraint. In this case the two minimizations are
2899:
1729:. That is why the expectation is taken over the true probability distribution
1012:
7462:
we want each classifier to do its best regardless of the ensemble and when
3774:); the terms "log loss" and "cross-entropy loss" are used interchangeably.
2417:
is the probability estimate of the model that the i-th word of the text is
1323:{\displaystyle H(p,q)=-\int _{\mathcal {X}}P(x)\,\log Q(x)\,\mathrm {d} x.}
2468:
of the true cross-entropy, where the test set is treated as samples from
228:, over the same underlying set of events, measures the average number of
2559:
subject to the optimization effort. Consider a given finite sequence of
7560:
3760:
is the predicted value of the current model. This is also known as the
3004:, and where the product is over the values without double counting. So
2303:{\displaystyle H(T,q)=-\sum _{i=1}^{N}{\frac {1}{N}}\log _{2}q(x_{i})}
4313:, we can use cross-entropy to get a measure of dissimilarity between
7697:
7617:
901:{\displaystyle H(p,q)=-\sum _{x\in {\mathcal {X}}}p(x)\,\log q(x).}
7572:
The
Mathematics of Information Coding, Extraction and Distribution
4122:. Similarly, the complementary probability of finding the output
3953:, commonly just a linear function. The probability of the output
2630:
of the model is then given by the product over all probabilities
1512:
can be seen as representing an implicit probability distribution
5607:{\displaystyle L({\boldsymbol {\beta }})=-\sum _{i=1}^{N}\left.}
3525:
is the fixed prior reference distribution, and the distribution
1576:{\displaystyle q(x_{i})=\left({\frac {1}{2}}\right)^{\ell _{i}}}
7715:
de Boer, Kroese, D.P., Mannor, S. and
Rubinstein, R.Y. (2005).
1772:
Indeed the expected message-length under the true distribution
7574:, by George Cybenko, Dianne P. O'Leary, Jorma Rissanen, 1999,
5022:(In this case, the binary label is often denoted by {ā1,+1}.)
2610:
sampling. The likelihood assigned to any considered parameter
229:
2993:{\textstyle \prod _{x_{i}}q_{\theta }(X=x_{i})^{-p(X=x_{i})}}
3243:
3013:
1271:
1136:
857:
795:
674:{\displaystyle H(p,q)=H(p)+D_{\mathrm {KL} }(p\parallel q),}
7488:
we would like the classifier to be as diverse as possible.
7138:
In a similar way, we eventually obtain the desired result.
3690:
does not agree with the literature and can be misleading.
1709:
is assumed while the data actually follows a distribution
4118:
is optimized through some appropriate algorithm such as
3321:, it does not affect extremization. So observe that the
3698:
Cross-entropy can be used to define a loss function in
5061:
2908:
7468:
7442:
7422:
7402:
7372:
7345:
7315:
7288:
7172:
7152:
6933:
6702:
6503:
6095:
5928:
5752:
5714:
5624:
5452:
5257:
5040:
4993:
4872:
4606:
4562:
4542:
4359:
4339:
4319:
4257:
4213:
4154:
4128:
4102:
3985:
3959:
3939:
3919:
3854:
3827:
3807:
3787:
3739:
3712:
3661:
3655:; see Cover and Thomas and Good. On the other hand,
3622:
3575:
3551:
3531:
3511:
3459:
3439:
3413:
3389:
3365:
3345:
3234:
3010:
2891:{\displaystyle PP:={\mathrm {e} }^{H(p,q_{\theta })}}
2842:
2800:
2762:
2732:
2712:
2685:
2636:
2616:
2585:
2565:
2545:
2518:
2474:
2450:
2423:
2387:
2367:
2338:
2318:
2212:
2189:
2183:
is the true distribution of words in any corpus, and
2169:
2149:
2125:
1800:
1778:
1755:
1735:
1715:
1695:
1668:
1641:
1589:
1518:
1466:
1439:
1403:
1383:
1344:
1240:
1126:
1104:
1084:
1064:
1044:
1024:
993:
973:
946:
926:
819:
792:
769:
749:
722:
689:
601:
579:
559:
535:
515:
468:
442:
402:
328:
306:
286:
258:
238:
214:
194:
3694:
Cross-entropy loss function and logistic regression
2143:, where a model is created based on a training set
920:distributions is analogous. We have to assume that
7480:
7454:
7428:
7408:
7388:
7358:
7331:
7301:
7274:
7158:
7128:
6920:
6689:
6490:
6082:
5915:
5736:
5698:
5606:
5439:
5244:
5008:
4979:
4856:
4586:
4548:
4512:
4345:
4325:
4305:
4243:
4197:
4140:
4110:
4088:
3971:
3945:
3925:
3905:
3833:
3813:
3793:
3752:
3725:
3682:
3643:
3608:
3557:
3537:
3517:
3476:
3445:
3425:
3395:
3371:
3351:
3309:
3220:
2992:
2890:
2828:
2786:
2748:
2718:
2698:
2671:
2622:
2598:
2571:
2551:
2531:
2489:
2456:
2436:
2409:
2373:
2353:
2324:
2302:
2195:
2175:
2155:
2131:
2104:
1784:
1764:
1741:
1721:
1701:
1681:
1654:
1627:
1575:
1504:
1452:
1409:
1389:
1365:
1322:
1221:
1110:
1090:
1070:
1050:
1030:
999:
979:
952:
932:
900:
802:
775:
755:
728:
704:
673:
585:
565:
541:
521:
501:
448:
424:
386:
312:
292:
264:
244:
220:
200:
4845:
4741:
2381:estimated from the training set. In other words,
4306:{\displaystyle q\in \{{\hat {y}},1-{\hat {y}}\}}
3383:are identical up to an additive constant (since
3609:{\displaystyle D_{\mathrm {KL} }(p\parallel q)}
3486:Principle of Minimum Discrimination Information
502:{\displaystyle D_{\mathrm {KL} }(p\parallel q)}
387:{\displaystyle H(p,q)=-\operatorname {E} _{p},}
3733:is the true label, and the given distribution
3325:amounts to minimization of the cross-entropy.
7634:Machine Learning: A Probabilistic Perspective
7416:is the true probability to be estimated, and
3651:. In fact, cross-entropy is another name for
158:
8:
4536:test accuracy. For example, suppose we have
4300:
4264:
4238:
4220:
1622:
1590:
1499:
1467:
459:The definition may be formulated using the
3407:, both take on their minimal values when
2756:, then the frequency of that value equals
1373:is also used for a different concept, the
436:operator with respect to the distribution
165:
151:
29:
7696:
7616:
7467:
7441:
7421:
7401:
7377:
7371:
7350:
7344:
7320:
7314:
7293:
7287:
7263:
7250:
7228:
7214:
7202:
7177:
7171:
7151:
7114:
7101:
7090:
7089:
7076:
7066:
7055:
7039:
7028:
7027:
7017:
7001:
6991:
6980:
6962:
6947:
6934:
6932:
6904:
6899:
6881:
6871:
6866:
6849:
6839:
6834:
6821:
6811:
6792:
6776:
6766:
6758:
6742:
6716:
6703:
6701:
6673:
6668:
6650:
6640:
6635:
6621:
6616:
6603:
6596:
6582:
6566:
6556:
6548:
6532:
6517:
6504:
6502:
6472:
6459:
6448:
6447:
6437:
6426:
6408:
6397:
6396:
6386:
6371:
6360:
6327:
6314:
6306:
6290:
6281:
6254:
6241:
6233:
6213:
6200:
6192:
6179:
6172:
6161:
6150:
6128:
6113:
6100:
6096:
6094:
6066:
6053:
6045:
6024:
6005:
5992:
5984:
5968:
5942:
5929:
5927:
5899:
5886:
5878:
5859:
5846:
5838:
5832:
5818:
5805:
5797:
5781:
5766:
5753:
5751:
5728:
5717:
5716:
5713:
5673:
5672:
5663:
5648:
5634:
5625:
5623:
5587:
5576:
5575:
5550:
5528:
5517:
5516:
5503:
5488:
5477:
5459:
5451:
5419:
5409:
5387:
5377:
5364:
5336:
5321:
5299:
5281:
5280:
5265:
5259:
5258:
5256:
5215:
5211:
5210:
5189:
5169:
5129:
5112:
5090:
5073:
5056:
5046:
5045:
5039:
4992:
4966:
4961:
4952:
4948:
4930:
4915:
4910:
4901:
4886:
4875:
4874:
4871:
4850:
4844:
4843:
4834:
4823:
4822:
4797:
4775:
4764:
4763:
4750:
4740:
4739:
4730:
4719:
4705:
4684:
4671:
4655:
4644:
4630:
4613:
4605:
4561:
4541:
4493:
4492:
4448:
4447:
4420:
4404:
4394:
4358:
4338:
4318:
4289:
4288:
4268:
4267:
4256:
4212:
4181:
4180:
4159:
4153:
4127:
4103:
4101:
4073:
4065:
4061:
4045:
4034:
4026:
4006:
4005:
3990:
3984:
3958:
3938:
3918:
3891:
3873:
3853:
3826:
3806:
3786:
3744:
3738:
3717:
3711:
3660:
3621:
3581:
3580:
3574:
3550:
3530:
3510:
3460:
3458:
3438:
3412:
3388:
3364:
3344:
3295:
3258:
3257:
3242:
3241:
3233:
3207:
3181:
3175:
3174:
3161:
3143:
3135:
3125:
3106:
3094:
3089:
3073:
3054:
3044:
3028:
3027:
3012:
3011:
3009:
2979:
2959:
2949:
2930:
2918:
2913:
2907:
2877:
2860:
2854:
2853:
2841:
2817:
2799:
2776:
2770:
2761:
2740:
2731:
2711:
2690:
2684:
2660:
2641:
2635:
2615:
2590:
2584:
2564:
2544:
2523:
2517:
2473:
2449:
2428:
2422:
2398:
2386:
2366:
2337:
2317:
2291:
2272:
2258:
2252:
2241:
2211:
2188:
2168:
2148:
2124:
2057:
2052:
2034:
2015:
1996:
1991:
1982:
1964:
1959:
1930:
1921:
1903:
1855:
1846:
1833:
1805:
1799:
1777:
1754:
1734:
1714:
1694:
1673:
1667:
1646:
1640:
1616:
1597:
1588:
1565:
1560:
1546:
1529:
1517:
1493:
1474:
1465:
1444:
1438:
1402:
1382:
1343:
1309:
1308:
1289:
1270:
1269:
1239:
1189:
1174:
1173:
1154:
1135:
1134:
1125:
1103:
1083:
1063:
1043:
1023:
992:
972:
945:
925:
876:
856:
855:
848:
818:
794:
793:
791:
768:
748:
721:
688:
643:
642:
600:
578:
558:
534:
514:
474:
473:
467:
441:
407:
401:
357:
327:
305:
285:
257:
237:
213:
193:
4523:
320:over a given set is defined as follows:
7534:
6963:
6129:
5649:
5635:
5460:
4598:of the loss function is then given by:
3845:, the probability is modeled using the
3359:against a fixed reference distribution
1628:{\displaystyle \{x_{1},\ldots ,x_{n}\}}
1505:{\displaystyle \{x_{1},\ldots ,x_{n}\}}
32:
7717:A tutorial on the cross-entropy method
5047:
280:The cross-entropy of the distribution
4198:{\displaystyle q_{y=0}=1-{\hat {y}}.}
3933:is some function of the input vector
3499:However, as discussed in the article
7:
7561:https://arxiv.org/pdf/2304.07288.pdf
4556:samples with each sample indexed by
2672:{\displaystyle q_{\theta }(X=x_{i})}
1058:be probability density functions of
252:, rather than the true distribution
3002:calculation rules for the logarithm
2606:from a training set, obtained from
97:Limiting density of discrete points
6940:
6936:
6709:
6705:
6510:
6506:
6106:
6102:
5935:
5931:
5759:
5755:
5631:
5627:
3585:
3582:
3490:Principle of Minimum Cross-Entropy
3461:
3176:
3136:
2855:
2763:
2733:
1900:
1830:
1802:
1310:
1186:
1175:
647:
644:
478:
475:
354:
25:
7366:is the output probability of the
5708:The proof is as follows. For any
5016:the logistic function as before.
3906:{\displaystyle g(z)=1/(1+e^{-z})}
3319:monotonically increasing function
2332:is the size of the test set, and
108:Asymptotic equipartition property
4962:
4953:
4911:
4902:
4614:
4104:
4074:
4066:
4035:
4027:
3259:
3029:
40:
3545:is optimized to be as close to
3477:{\displaystyle \mathrm {H} (p)}
2444:. The sum is averaged over the
964:with respect to some reference
124:Shannon's source coding theorem
7269:
7243:
7208:
7189:
7120:
7095:
7085:
7045:
7033:
7010:
6967:
6959:
6478:
6453:
6443:
6402:
6287:
6268:
6133:
6125:
5737:{\displaystyle {\hat {y}}_{i}}
5722:
5690:
5678:
5669:
5653:
5645:
5593:
5581:
5565:
5556:
5537:
5522:
5464:
5456:
5428:
5354:
5330:
5292:
5286:
5271:
5234:
5222:
5003:
4997:
4974:
4935:
4921:
4898:
4880:
4840:
4828:
4812:
4803:
4784:
4769:
4690:
4664:
4618:
4610:
4504:
4498:
4483:
4474:
4462:
4453:
4375:
4363:
4294:
4273:
4244:{\displaystyle p\in \{y,1-y\}}
4186:
4039:
4023:
4011:
3900:
3878:
3864:
3858:
3777:More specifically, consider a
3677:
3665:
3638:
3626:
3603:
3591:
3471:
3465:
3301:
3282:
3264:
3248:
3213:
3194:
3132:
3112:
3079:
3060:
3034:
3018:
2985:
2966:
2956:
2936:
2883:
2864:
2823:
2804:
2666:
2647:
2501:Relation to maximum likelihood
2484:
2478:
2404:
2391:
2348:
2342:
2297:
2284:
2228:
2216:
2096:
2084:
2075:
2069:
2049:
2043:
2021:
2008:
1988:
1975:
1940:
1934:
1883:
1877:
1865:
1859:
1820:
1814:
1662:is the length of the code for
1535:
1522:
1460:out of a set of possibilities
1360:
1348:
1305:
1299:
1286:
1280:
1256:
1244:
1213:
1198:
1170:
1164:
1151:
1145:
892:
886:
873:
867:
835:
823:
803:{\displaystyle {\mathcal {X}}}
699:
693:
665:
653:
632:
626:
617:
605:
496:
484:
419:
413:
378:
366:
344:
332:
82:Conditional mutual information
1:
7721:Annals of Operations Research
7518:Maximum-likelihood estimation
3505:, sometimes the distribution
2902:, which can be seen to equal
2464:words of the test. This is a
27:Information-theoretic measure
7668:10.1007/978-3-030-20951-3_18
7309:is the cost function of the
4587:{\displaystyle n=1,\dots ,N}
4207:Having set up our notation,
4111:{\displaystyle \mathbf {w} }
4096:where the vector of weights
3403:is fixed): According to the
2361:is the probability of event
1431:Kraft–McMillan theorem
134:Noisy-channel coding theorem
3502:KullbackāLeibler divergence
2532:{\displaystyle q_{\theta }}
461:KullbackāLeibler divergence
300:relative to a distribution
7758:
7481:{\displaystyle \lambda =1}
7455:{\displaystyle \lambda =0}
3332:
3329:Cross-entropy minimization
2829:{\displaystyle p(X=x_{i})}
2139:is unknown. An example is
743:probability distributions
7513:KullbackāLeibler distance
3317:Since the logarithm is a
2787:{\displaystyle \#x_{i}/N}
2608:conditionally independent
1655:{\displaystyle \ell _{i}}
187:probability distributions
7599:sklearn.metrics.log_loss
7429:{\displaystyle \lambda }
5616:Then we have the result
2410:{\displaystyle q(x_{i})}
7737:Entropy and information
3488:") is often called the
3453:for KL divergence, and
3323:likelihood maximization
2794:. Denote the latter by
2749:{\displaystyle \#x_{i}}
2623:{\displaystyle \theta }
2552:{\displaystyle \theta }
139:ShannonāHartley theorem
7632:Murphy, Kevin (2012).
7482:
7456:
7430:
7410:
7390:
7389:{\displaystyle k^{th}}
7360:
7333:
7332:{\displaystyle k^{th}}
7303:
7276:
7160:
7130:
7071:
6996:
6922:
6691:
6492:
6442:
6376:
6166:
6084:
5917:
5738:
5700:
5608:
5493:
5441:
5246:
5010:
4981:
4858:
4735:
4660:
4588:
4550:
4530:
4514:
4347:
4327:
4307:
4245:
4199:
4142:
4112:
4090:
3973:
3947:
3927:
3907:
3835:
3815:
3795:
3754:
3727:
3684:
3683:{\displaystyle H(p,q)}
3645:
3644:{\displaystyle H(p,q)}
3610:
3559:
3539:
3519:
3478:
3447:
3427:
3397:
3373:
3353:
3311:
3222:
2994:
2892:
2830:
2788:
2750:
2720:
2700:
2673:
2624:
2600:
2573:
2553:
2533:
2491:
2458:
2438:
2411:
2375:
2355:
2326:
2304:
2257:
2197:
2177:
2157:
2133:
2106:
1786:
1766:
1743:
1723:
1703:
1683:
1656:
1629:
1577:
1506:
1454:
1411:
1391:
1367:
1366:{\displaystyle H(p,q)}
1324:
1223:
1112:
1092:
1072:
1052:
1032:
1001:
981:
954:
934:
902:
804:
777:
757:
730:
706:
675:
587:
567:
543:
523:
503:
450:
426:
388:
314:
294:
266:
246:
222:
202:
113:Rateādistortion theory
7483:
7457:
7431:
7411:
7391:
7361:
7359:{\displaystyle q^{k}}
7334:
7304:
7302:{\displaystyle e^{k}}
7277:
7161:
7142:Amended cross-entropy
7131:
7051:
6976:
6923:
6692:
6493:
6422:
6356:
6146:
6085:
5918:
5739:
5701:
5609:
5473:
5442:
5247:
5011:
4982:
4859:
4715:
4640:
4589:
4551:
4527:
4515:
4348:
4328:
4308:
4246:
4200:
4143:
4113:
4091:
3974:
3948:
3928:
3908:
3836:
3816:
3796:
3755:
3753:{\displaystyle q_{i}}
3728:
3726:{\displaystyle p_{i}}
3685:
3646:
3611:
3560:
3540:
3520:
3479:
3448:
3428:
3398:
3374:
3354:
3312:
3223:
2995:
2893:
2831:
2789:
2751:
2721:
2701:
2699:{\displaystyle x_{i}}
2674:
2625:
2601:
2599:{\displaystyle x_{i}}
2574:
2554:
2534:
2492:
2459:
2439:
2437:{\displaystyle x_{i}}
2412:
2376:
2356:
2327:
2305:
2237:
2198:
2178:
2158:
2134:
2107:
1787:
1767:
1744:
1724:
1704:
1684:
1682:{\displaystyle x_{i}}
1657:
1630:
1578:
1507:
1455:
1453:{\displaystyle x_{i}}
1412:
1392:
1368:
1325:
1224:
1113:
1093:
1073:
1053:
1033:
1002:
982:
962:absolutely continuous
955:
935:
903:
805:
778:
758:
731:
707:
676:
588:
568:
544:
524:
504:
451:
427:
425:{\displaystyle E_{p}}
389:
315:
295:
267:
247:
223:
203:
7498:Cross-entropy method
7466:
7440:
7420:
7400:
7370:
7343:
7313:
7286:
7170:
7150:
6931:
6700:
6501:
6093:
5926:
5750:
5712:
5622:
5450:
5255:
5038:
5009:{\displaystyle g(z)}
4991:
4870:
4604:
4560:
4540:
4357:
4337:
4317:
4255:
4211:
4152:
4126:
4100:
3983:
3957:
3937:
3917:
3852:
3825:
3805:
3785:
3737:
3710:
3659:
3620:
3573:
3549:
3529:
3509:
3457:
3437:
3411:
3387:
3379:, cross-entropy and
3363:
3343:
3335:Cross-entropy method
3232:
3008:
2906:
2840:
2798:
2760:
2730:
2710:
2683:
2634:
2614:
2583:
2563:
2543:
2516:
2490:{\displaystyle p(x)}
2472:
2466:Monte Carlo estimate
2448:
2421:
2385:
2365:
2354:{\displaystyle q(x)}
2336:
2316:
2210:
2187:
2167:
2147:
2123:
1798:
1776:
1753:
1733:
1713:
1693:
1666:
1639:
1587:
1516:
1464:
1437:
1401:
1381:
1342:
1238:
1124:
1102:
1082:
1062:
1042:
1022:
991:
971:
944:
924:
817:
790:
767:
747:
720:
705:{\displaystyle H(p)}
687:
599:
577:
557:
533:
513:
466:
440:
400:
326:
304:
284:
256:
236:
212:
192:
77:Directed information
57:Differential entropy
7508:Conditional entropy
7503:Logistic regression
4148:is simply given by
4141:{\displaystyle y=0}
3972:{\displaystyle y=1}
3843:logistic regression
3426:{\displaystyle p=q}
1330: (
908: (
549:(also known as the
62:Conditional entropy
7523:Mutual information
7478:
7452:
7426:
7406:
7386:
7356:
7329:
7299:
7272:
7239:
7156:
7126:
6918:
6687:
6488:
6486:
6080:
5913:
5734:
5696:
5604:
5437:
5242:
5200:
5032:. That is, define
5006:
4977:
4854:
4584:
4546:
4531:
4510:
4399:
4343:
4323:
4303:
4241:
4195:
4138:
4108:
4086:
3969:
3943:
3923:
3903:
3831:
3811:
3791:
3750:
3723:
3680:
3641:
3606:
3555:
3535:
3515:
3474:
3443:
3423:
3393:
3369:
3349:
3307:
3218:
3101:
3049:
2990:
2925:
2888:
2826:
2784:
2746:
2716:
2696:
2669:
2620:
2596:
2569:
2549:
2529:
2487:
2454:
2434:
2407:
2371:
2351:
2322:
2300:
2193:
2173:
2153:
2129:
2102:
2039:
1971:
1782:
1765:{\displaystyle q.}
1762:
1739:
1719:
1699:
1679:
1652:
1625:
1573:
1502:
1450:
1427:information theory
1407:
1387:
1363:
1320:
1219:
1108:
1088:
1068:
1048:
1028:
997:
977:
950:
930:
916:The situation for
898:
863:
800:
773:
753:
726:
702:
671:
583:
563:
539:
519:
499:
446:
422:
384:
310:
290:
262:
242:
218:
198:
179:information theory
72:Mutual information
34:Information theory
18:Cross entropy loss
7677:978-3-030-20950-6
7409:{\displaystyle p}
7224:
7222:
7159:{\displaystyle K}
7098:
7036:
6954:
6913:
6801:
6723:
6682:
6591:
6524:
6456:
6405:
6336:
6263:
6120:
6075:
6014:
5949:
5908:
5827:
5773:
5725:
5681:
5640:
5584:
5525:
5432:
5289:
5274:
5030:linear regression
4883:
4831:
4772:
4738:
4713:
4701:
4695:
4638:
4629:
4623:
4549:{\displaystyle N}
4501:
4456:
4434:
4428:
4390:
4386:
4380:
4346:{\displaystyle q}
4326:{\displaystyle p}
4297:
4276:
4189:
4081:
4014:
3946:{\displaystyle x}
3926:{\displaystyle z}
3847:logistic function
3834:{\displaystyle x}
3814:{\displaystyle 1}
3794:{\displaystyle 0}
3779:binary regression
3558:{\displaystyle q}
3538:{\displaystyle p}
3518:{\displaystyle q}
3446:{\displaystyle 0}
3405:Gibbs' inequality
3396:{\displaystyle p}
3372:{\displaystyle p}
3352:{\displaystyle q}
3085:
3040:
2909:
2719:{\displaystyle i}
2572:{\displaystyle N}
2457:{\displaystyle N}
2374:{\displaystyle x}
2325:{\displaystyle N}
2266:
2196:{\displaystyle q}
2176:{\displaystyle p}
2156:{\displaystyle T}
2141:language modeling
2132:{\displaystyle p}
2030:
1955:
1887:
1785:{\displaystyle p}
1742:{\displaystyle p}
1722:{\displaystyle p}
1702:{\displaystyle q}
1554:
1410:{\displaystyle q}
1390:{\displaystyle p}
1338:NB: The notation
1111:{\displaystyle r}
1091:{\displaystyle q}
1071:{\displaystyle p}
1051:{\displaystyle Q}
1031:{\displaystyle P}
1000:{\displaystyle r}
980:{\displaystyle r}
953:{\displaystyle q}
933:{\displaystyle p}
844:
776:{\displaystyle q}
756:{\displaystyle p}
729:{\displaystyle p}
586:{\displaystyle q}
566:{\displaystyle p}
542:{\displaystyle q}
522:{\displaystyle p}
449:{\displaystyle p}
313:{\displaystyle p}
293:{\displaystyle q}
265:{\displaystyle p}
245:{\displaystyle q}
221:{\displaystyle q}
201:{\displaystyle p}
175:
174:
16:(Redirected from
7749:
7703:
7702:
7700:
7688:
7682:
7681:
7654:
7648:
7647:
7629:
7623:
7622:
7620:
7607:
7601:
7596:
7590:
7584:
7578:
7569:
7563:
7557:
7551:
7548:
7542:
7539:
7487:
7485:
7484:
7479:
7461:
7459:
7458:
7453:
7435:
7433:
7432:
7427:
7415:
7413:
7412:
7407:
7395:
7393:
7392:
7387:
7385:
7384:
7365:
7363:
7362:
7357:
7355:
7354:
7338:
7336:
7335:
7330:
7328:
7327:
7308:
7306:
7305:
7300:
7298:
7297:
7281:
7279:
7278:
7273:
7268:
7267:
7255:
7254:
7238:
7223:
7215:
7207:
7206:
7182:
7181:
7165:
7163:
7162:
7157:
7135:
7133:
7132:
7127:
7119:
7118:
7106:
7105:
7100:
7099:
7091:
7084:
7083:
7070:
7065:
7044:
7043:
7038:
7037:
7029:
7022:
7021:
7009:
7008:
6995:
6990:
6966:
6955:
6953:
6952:
6951:
6935:
6927:
6925:
6924:
6919:
6914:
6912:
6911:
6910:
6909:
6908:
6891:
6890:
6889:
6888:
6876:
6875:
6860:
6859:
6858:
6857:
6856:
6844:
6843:
6829:
6828:
6812:
6807:
6803:
6802:
6800:
6799:
6798:
6797:
6796:
6784:
6783:
6771:
6770:
6743:
6724:
6722:
6721:
6720:
6704:
6696:
6694:
6693:
6688:
6683:
6681:
6680:
6679:
6678:
6677:
6660:
6659:
6658:
6657:
6645:
6644:
6629:
6628:
6627:
6626:
6625:
6611:
6610:
6597:
6592:
6590:
6589:
6588:
6587:
6586:
6574:
6573:
6561:
6560:
6533:
6525:
6523:
6522:
6521:
6505:
6497:
6495:
6494:
6489:
6487:
6477:
6476:
6464:
6463:
6458:
6457:
6449:
6441:
6436:
6418:
6414:
6413:
6412:
6407:
6406:
6398:
6391:
6390:
6375:
6370:
6346:
6342:
6338:
6337:
6335:
6334:
6333:
6332:
6331:
6319:
6318:
6291:
6286:
6285:
6264:
6262:
6261:
6260:
6259:
6258:
6246:
6245:
6221:
6220:
6219:
6218:
6217:
6205:
6204:
6184:
6183:
6173:
6165:
6160:
6132:
6121:
6119:
6118:
6117:
6101:
6089:
6087:
6086:
6081:
6076:
6074:
6073:
6072:
6071:
6070:
6058:
6057:
6033:
6025:
6020:
6016:
6015:
6013:
6012:
6011:
6010:
6009:
5997:
5996:
5969:
5950:
5948:
5947:
5946:
5930:
5922:
5920:
5919:
5914:
5909:
5907:
5906:
5905:
5904:
5903:
5891:
5890:
5866:
5865:
5864:
5863:
5851:
5850:
5833:
5828:
5826:
5825:
5824:
5823:
5822:
5810:
5809:
5782:
5774:
5772:
5771:
5770:
5754:
5743:
5741:
5740:
5735:
5733:
5732:
5727:
5726:
5718:
5705:
5703:
5702:
5697:
5683:
5682:
5674:
5668:
5667:
5652:
5641:
5639:
5638:
5626:
5613:
5611:
5610:
5605:
5600:
5596:
5592:
5591:
5586:
5585:
5577:
5555:
5554:
5533:
5532:
5527:
5526:
5518:
5508:
5507:
5492:
5487:
5463:
5446:
5444:
5443:
5438:
5433:
5431:
5427:
5426:
5414:
5413:
5395:
5394:
5382:
5381:
5369:
5368:
5337:
5329:
5328:
5307:
5306:
5291:
5290:
5282:
5276:
5275:
5270:
5269:
5260:
5251:
5249:
5248:
5243:
5238:
5237:
5214:
5205:
5204:
5197:
5196:
5177:
5176:
5151:
5137:
5136:
5117:
5116:
5098:
5097:
5078:
5077:
5052:
5051:
5050:
5021:
5015:
5013:
5012:
5007:
4986:
4984:
4983:
4978:
4973:
4972:
4971:
4970:
4965:
4956:
4934:
4920:
4919:
4914:
4905:
4891:
4890:
4885:
4884:
4876:
4863:
4861:
4860:
4855:
4849:
4848:
4839:
4838:
4833:
4832:
4824:
4802:
4801:
4780:
4779:
4774:
4773:
4765:
4755:
4754:
4745:
4744:
4736:
4734:
4729:
4714:
4706:
4699:
4693:
4689:
4688:
4676:
4675:
4659:
4654:
4639:
4631:
4627:
4621:
4617:
4593:
4591:
4590:
4585:
4555:
4553:
4552:
4547:
4519:
4517:
4516:
4511:
4503:
4502:
4494:
4458:
4457:
4449:
4432:
4426:
4425:
4424:
4409:
4408:
4398:
4384:
4378:
4352:
4350:
4349:
4344:
4332:
4330:
4329:
4324:
4312:
4310:
4309:
4304:
4299:
4298:
4290:
4278:
4277:
4269:
4250:
4248:
4247:
4242:
4204:
4202:
4201:
4196:
4191:
4190:
4182:
4170:
4169:
4147:
4145:
4144:
4139:
4120:gradient descent
4117:
4115:
4114:
4109:
4107:
4095:
4093:
4092:
4087:
4082:
4080:
4079:
4078:
4077:
4069:
4046:
4038:
4030:
4016:
4015:
4007:
4001:
4000:
3978:
3976:
3975:
3970:
3952:
3950:
3949:
3944:
3932:
3930:
3929:
3924:
3912:
3910:
3909:
3904:
3899:
3898:
3877:
3840:
3838:
3837:
3832:
3820:
3818:
3817:
3812:
3800:
3798:
3797:
3792:
3766:logarithmic loss
3759:
3757:
3756:
3751:
3749:
3748:
3732:
3730:
3729:
3724:
3722:
3721:
3700:machine learning
3689:
3687:
3686:
3681:
3653:relative entropy
3650:
3648:
3647:
3642:
3615:
3613:
3612:
3607:
3590:
3589:
3588:
3564:
3562:
3561:
3556:
3544:
3542:
3541:
3536:
3524:
3522:
3521:
3516:
3483:
3481:
3480:
3475:
3464:
3452:
3450:
3449:
3444:
3432:
3430:
3429:
3424:
3402:
3400:
3399:
3394:
3378:
3376:
3375:
3370:
3358:
3356:
3355:
3350:
3316:
3314:
3313:
3308:
3300:
3299:
3263:
3262:
3247:
3246:
3227:
3225:
3224:
3219:
3217:
3216:
3212:
3211:
3180:
3179:
3169:
3168:
3150:
3149:
3148:
3147:
3130:
3129:
3111:
3110:
3100:
3099:
3098:
3078:
3077:
3059:
3058:
3048:
3033:
3032:
3017:
3016:
2999:
2997:
2996:
2991:
2989:
2988:
2984:
2983:
2954:
2953:
2935:
2934:
2924:
2923:
2922:
2897:
2895:
2894:
2889:
2887:
2886:
2882:
2881:
2859:
2858:
2835:
2833:
2832:
2827:
2822:
2821:
2793:
2791:
2790:
2785:
2780:
2775:
2774:
2755:
2753:
2752:
2747:
2745:
2744:
2726:) is denoted by
2725:
2723:
2722:
2717:
2706:(for some index
2705:
2703:
2702:
2697:
2695:
2694:
2678:
2676:
2675:
2670:
2665:
2664:
2646:
2645:
2629:
2627:
2626:
2621:
2605:
2603:
2602:
2597:
2595:
2594:
2578:
2576:
2575:
2570:
2558:
2556:
2555:
2550:
2538:
2536:
2535:
2530:
2528:
2527:
2496:
2494:
2493:
2488:
2463:
2461:
2460:
2455:
2443:
2441:
2440:
2435:
2433:
2432:
2416:
2414:
2413:
2408:
2403:
2402:
2380:
2378:
2377:
2372:
2360:
2358:
2357:
2352:
2331:
2329:
2328:
2323:
2309:
2307:
2306:
2301:
2296:
2295:
2277:
2276:
2267:
2259:
2256:
2251:
2202:
2200:
2199:
2194:
2182:
2180:
2179:
2174:
2162:
2160:
2159:
2154:
2138:
2136:
2135:
2130:
2111:
2109:
2108:
2103:
2062:
2061:
2038:
2020:
2019:
2001:
2000:
1987:
1986:
1970:
1969:
1968:
1948:
1944:
1943:
1926:
1925:
1908:
1907:
1892:
1888:
1886:
1869:
1868:
1847:
1838:
1837:
1810:
1809:
1791:
1789:
1788:
1783:
1771:
1769:
1768:
1763:
1748:
1746:
1745:
1740:
1728:
1726:
1725:
1720:
1708:
1706:
1705:
1700:
1688:
1686:
1685:
1680:
1678:
1677:
1661:
1659:
1658:
1653:
1651:
1650:
1634:
1632:
1631:
1626:
1621:
1620:
1602:
1601:
1582:
1580:
1579:
1574:
1572:
1571:
1570:
1569:
1559:
1555:
1547:
1534:
1533:
1511:
1509:
1508:
1503:
1498:
1497:
1479:
1478:
1459:
1457:
1456:
1451:
1449:
1448:
1416:
1414:
1413:
1408:
1396:
1394:
1393:
1388:
1372:
1370:
1369:
1364:
1333:
1329:
1327:
1326:
1321:
1313:
1276:
1275:
1274:
1228:
1226:
1225:
1220:
1194:
1193:
1178:
1141:
1140:
1139:
1117:
1115:
1114:
1109:
1098:with respect to
1097:
1095:
1094:
1089:
1077:
1075:
1074:
1069:
1057:
1055:
1054:
1049:
1037:
1035:
1034:
1029:
1009:Lebesgue measure
1006:
1004:
1003:
998:
986:
984:
983:
978:
959:
957:
956:
951:
939:
937:
936:
931:
911:
907:
905:
904:
899:
862:
861:
860:
809:
807:
806:
801:
799:
798:
782:
780:
779:
774:
762:
760:
759:
754:
735:
733:
732:
727:
711:
709:
708:
703:
680:
678:
677:
672:
652:
651:
650:
592:
590:
589:
584:
573:with respect to
572:
570:
569:
564:
551:relative entropy
548:
546:
545:
540:
528:
526:
525:
520:
509:, divergence of
508:
506:
505:
500:
483:
482:
481:
455:
453:
452:
447:
431:
429:
428:
423:
412:
411:
393:
391:
390:
385:
362:
361:
319:
317:
316:
311:
299:
297:
296:
291:
271:
269:
268:
263:
251:
249:
248:
243:
227:
225:
224:
219:
207:
205:
204:
199:
167:
160:
153:
129:Channel capacity
87:Relative entropy
44:
30:
21:
7757:
7756:
7752:
7751:
7750:
7748:
7747:
7746:
7727:
7726:
7723:134 (1), 19ā67.
7712:
7710:Further reading
7707:
7706:
7690:
7689:
7685:
7678:
7656:
7655:
7651:
7644:
7631:
7630:
7626:
7609:
7608:
7604:
7597:
7593:
7585:
7581:
7570:
7566:
7558:
7554:
7549:
7545:
7540:
7536:
7531:
7494:
7464:
7463:
7438:
7437:
7418:
7417:
7398:
7397:
7373:
7368:
7367:
7346:
7341:
7340:
7316:
7311:
7310:
7289:
7284:
7283:
7259:
7246:
7198:
7173:
7168:
7167:
7148:
7147:
7144:
7110:
7088:
7072:
7026:
7013:
6997:
6943:
6939:
6929:
6928:
6900:
6895:
6877:
6867:
6862:
6861:
6845:
6835:
6830:
6817:
6813:
6788:
6772:
6762:
6754:
6747:
6735:
6731:
6712:
6708:
6698:
6697:
6669:
6664:
6646:
6636:
6631:
6630:
6617:
6612:
6599:
6598:
6578:
6562:
6552:
6544:
6537:
6513:
6509:
6499:
6498:
6485:
6484:
6468:
6446:
6395:
6382:
6381:
6377:
6344:
6343:
6323:
6310:
6302:
6295:
6277:
6250:
6237:
6229:
6222:
6209:
6196:
6188:
6175:
6174:
6171:
6167:
6136:
6109:
6105:
6091:
6090:
6062:
6049:
6041:
6034:
6026:
6001:
5988:
5980:
5973:
5961:
5957:
5938:
5934:
5924:
5923:
5895:
5882:
5874:
5867:
5855:
5842:
5834:
5814:
5801:
5793:
5786:
5762:
5758:
5748:
5747:
5715:
5710:
5709:
5659:
5630:
5620:
5619:
5574:
5546:
5515:
5499:
5498:
5494:
5448:
5447:
5415:
5405:
5383:
5373:
5360:
5341:
5317:
5295:
5261:
5253:
5252:
5209:
5199:
5198:
5185:
5183:
5178:
5165:
5163:
5157:
5156:
5150:
5145:
5139:
5138:
5125:
5123:
5118:
5108:
5106:
5100:
5099:
5086:
5084:
5079:
5069:
5067:
5057:
5041:
5036:
5035:
5019:
4989:
4988:
4960:
4944:
4909:
4873:
4868:
4867:
4821:
4793:
4762:
4746:
4680:
4667:
4602:
4601:
4558:
4557:
4538:
4537:
4416:
4400:
4355:
4354:
4335:
4334:
4315:
4314:
4253:
4252:
4209:
4208:
4155:
4150:
4149:
4124:
4123:
4098:
4097:
4057:
4050:
3986:
3981:
3980:
3955:
3954:
3935:
3934:
3915:
3914:
3887:
3850:
3849:
3823:
3822:
3803:
3802:
3783:
3782:
3740:
3735:
3734:
3713:
3708:
3707:
3696:
3657:
3656:
3618:
3617:
3576:
3571:
3570:
3547:
3546:
3527:
3526:
3507:
3506:
3455:
3454:
3435:
3434:
3409:
3408:
3385:
3384:
3361:
3360:
3341:
3340:
3337:
3331:
3291:
3230:
3229:
3203:
3173:
3157:
3139:
3131:
3121:
3102:
3090:
3069:
3050:
3006:
3005:
2975:
2955:
2945:
2926:
2914:
2904:
2903:
2873:
2852:
2838:
2837:
2813:
2796:
2795:
2766:
2758:
2757:
2736:
2728:
2727:
2708:
2707:
2686:
2681:
2680:
2656:
2637:
2632:
2631:
2612:
2611:
2586:
2581:
2580:
2561:
2560:
2541:
2540:
2519:
2514:
2513:
2503:
2470:
2469:
2446:
2445:
2424:
2419:
2418:
2394:
2383:
2382:
2363:
2362:
2334:
2333:
2314:
2313:
2287:
2268:
2208:
2207:
2185:
2184:
2165:
2164:
2145:
2144:
2121:
2120:
2117:
2053:
2011:
1992:
1978:
1960:
1917:
1916:
1912:
1899:
1870:
1848:
1842:
1829:
1801:
1796:
1795:
1774:
1773:
1751:
1750:
1731:
1730:
1711:
1710:
1691:
1690:
1669:
1664:
1663:
1642:
1637:
1636:
1612:
1593:
1585:
1584:
1561:
1542:
1541:
1525:
1514:
1513:
1489:
1470:
1462:
1461:
1440:
1435:
1434:
1423:
1399:
1398:
1379:
1378:
1340:
1339:
1336:
1331:
1265:
1236:
1235:
1185:
1130:
1122:
1121:
1100:
1099:
1080:
1079:
1060:
1059:
1040:
1039:
1020:
1019:
989:
988:
969:
968:
942:
941:
922:
921:
914:
909:
815:
814:
788:
787:
765:
764:
745:
744:
718:
717:
685:
684:
638:
597:
596:
575:
574:
555:
554:
531:
530:
511:
510:
469:
464:
463:
438:
437:
403:
398:
397:
353:
324:
323:
302:
301:
282:
281:
278:
254:
253:
234:
233:
210:
209:
190:
189:
171:
28:
23:
22:
15:
12:
11:
5:
7755:
7753:
7745:
7744:
7742:Loss functions
7739:
7729:
7728:
7725:
7724:
7711:
7708:
7705:
7704:
7683:
7676:
7649:
7643:978-0262018029
7642:
7624:
7602:
7591:
7579:
7564:
7552:
7543:
7533:
7532:
7530:
7527:
7526:
7525:
7520:
7515:
7510:
7505:
7500:
7493:
7490:
7477:
7474:
7471:
7451:
7448:
7445:
7425:
7405:
7383:
7380:
7376:
7353:
7349:
7326:
7323:
7319:
7296:
7292:
7271:
7266:
7262:
7258:
7253:
7249:
7245:
7242:
7237:
7234:
7231:
7227:
7221:
7218:
7213:
7210:
7205:
7201:
7197:
7194:
7191:
7188:
7185:
7180:
7176:
7155:
7143:
7140:
7125:
7122:
7117:
7113:
7109:
7104:
7097:
7094:
7087:
7082:
7079:
7075:
7069:
7064:
7061:
7058:
7054:
7050:
7047:
7042:
7035:
7032:
7025:
7020:
7016:
7012:
7007:
7004:
7000:
6994:
6989:
6986:
6983:
6979:
6975:
6972:
6969:
6965:
6961:
6958:
6950:
6946:
6942:
6938:
6917:
6907:
6903:
6898:
6894:
6887:
6884:
6880:
6874:
6870:
6865:
6855:
6852:
6848:
6842:
6838:
6833:
6827:
6824:
6820:
6816:
6810:
6806:
6795:
6791:
6787:
6782:
6779:
6775:
6769:
6765:
6761:
6757:
6753:
6750:
6746:
6741:
6738:
6734:
6730:
6727:
6719:
6715:
6711:
6707:
6686:
6676:
6672:
6667:
6663:
6656:
6653:
6649:
6643:
6639:
6634:
6624:
6620:
6615:
6609:
6606:
6602:
6595:
6585:
6581:
6577:
6572:
6569:
6565:
6559:
6555:
6551:
6547:
6543:
6540:
6536:
6531:
6528:
6520:
6516:
6512:
6508:
6483:
6480:
6475:
6471:
6467:
6462:
6455:
6452:
6445:
6440:
6435:
6432:
6429:
6425:
6421:
6417:
6411:
6404:
6401:
6394:
6389:
6385:
6380:
6374:
6369:
6366:
6363:
6359:
6355:
6352:
6349:
6347:
6345:
6341:
6330:
6326:
6322:
6317:
6313:
6309:
6305:
6301:
6298:
6294:
6289:
6284:
6280:
6276:
6273:
6270:
6267:
6257:
6253:
6249:
6244:
6240:
6236:
6232:
6228:
6225:
6216:
6212:
6208:
6203:
6199:
6195:
6191:
6187:
6182:
6178:
6170:
6164:
6159:
6156:
6153:
6149:
6145:
6142:
6139:
6137:
6135:
6131:
6127:
6124:
6116:
6112:
6108:
6104:
6099:
6098:
6079:
6069:
6065:
6061:
6056:
6052:
6048:
6044:
6040:
6037:
6032:
6029:
6023:
6019:
6008:
6004:
6000:
5995:
5991:
5987:
5983:
5979:
5976:
5972:
5967:
5964:
5960:
5956:
5953:
5945:
5941:
5937:
5933:
5912:
5902:
5898:
5894:
5889:
5885:
5881:
5877:
5873:
5870:
5862:
5858:
5854:
5849:
5845:
5841:
5837:
5831:
5821:
5817:
5813:
5808:
5804:
5800:
5796:
5792:
5789:
5785:
5780:
5777:
5769:
5765:
5761:
5757:
5731:
5724:
5721:
5695:
5692:
5689:
5686:
5680:
5677:
5671:
5666:
5662:
5658:
5655:
5651:
5647:
5644:
5637:
5633:
5629:
5603:
5599:
5595:
5590:
5583:
5580:
5573:
5570:
5567:
5564:
5561:
5558:
5553:
5549:
5545:
5542:
5539:
5536:
5531:
5524:
5521:
5514:
5511:
5506:
5502:
5497:
5491:
5486:
5483:
5480:
5476:
5472:
5469:
5466:
5462:
5458:
5455:
5436:
5430:
5425:
5422:
5418:
5412:
5408:
5404:
5401:
5398:
5393:
5390:
5386:
5380:
5376:
5372:
5367:
5363:
5359:
5356:
5353:
5350:
5347:
5344:
5340:
5335:
5332:
5327:
5324:
5320:
5316:
5313:
5310:
5305:
5302:
5298:
5294:
5288:
5285:
5279:
5273:
5268:
5264:
5241:
5236:
5233:
5230:
5227:
5224:
5221:
5218:
5213:
5208:
5203:
5195:
5192:
5188:
5184:
5182:
5179:
5175:
5172:
5168:
5164:
5162:
5159:
5158:
5155:
5152:
5149:
5146:
5144:
5141:
5140:
5135:
5132:
5128:
5124:
5122:
5119:
5115:
5111:
5107:
5105:
5102:
5101:
5096:
5093:
5089:
5085:
5083:
5080:
5076:
5072:
5068:
5066:
5063:
5062:
5060:
5055:
5049:
5044:
5005:
5002:
4999:
4996:
4976:
4969:
4964:
4959:
4955:
4951:
4947:
4943:
4940:
4937:
4933:
4929:
4926:
4923:
4918:
4913:
4908:
4904:
4900:
4897:
4894:
4889:
4882:
4879:
4853:
4847:
4842:
4837:
4830:
4827:
4820:
4817:
4814:
4811:
4808:
4805:
4800:
4796:
4792:
4789:
4786:
4783:
4778:
4771:
4768:
4761:
4758:
4753:
4749:
4743:
4733:
4728:
4725:
4722:
4718:
4712:
4709:
4704:
4698:
4692:
4687:
4683:
4679:
4674:
4670:
4666:
4663:
4658:
4653:
4650:
4647:
4643:
4637:
4634:
4626:
4620:
4616:
4612:
4609:
4583:
4580:
4577:
4574:
4571:
4568:
4565:
4545:
4533:
4532:
4509:
4506:
4500:
4497:
4491:
4488:
4485:
4482:
4479:
4476:
4473:
4470:
4467:
4464:
4461:
4455:
4452:
4446:
4443:
4440:
4437:
4431:
4423:
4419:
4415:
4412:
4407:
4403:
4397:
4393:
4389:
4383:
4377:
4374:
4371:
4368:
4365:
4362:
4342:
4322:
4302:
4296:
4293:
4287:
4284:
4281:
4275:
4272:
4266:
4263:
4260:
4240:
4237:
4234:
4231:
4228:
4225:
4222:
4219:
4216:
4194:
4188:
4185:
4179:
4176:
4173:
4168:
4165:
4162:
4158:
4137:
4134:
4131:
4106:
4085:
4076:
4072:
4068:
4064:
4060:
4056:
4053:
4049:
4044:
4041:
4037:
4033:
4029:
4025:
4022:
4019:
4013:
4010:
4004:
3999:
3996:
3993:
3989:
3968:
3965:
3962:
3942:
3922:
3902:
3897:
3894:
3890:
3886:
3883:
3880:
3876:
3872:
3869:
3866:
3863:
3860:
3857:
3830:
3810:
3790:
3747:
3743:
3720:
3716:
3695:
3692:
3679:
3676:
3673:
3670:
3667:
3664:
3640:
3637:
3634:
3631:
3628:
3625:
3616:, rather than
3605:
3602:
3599:
3596:
3593:
3587:
3584:
3579:
3554:
3534:
3514:
3473:
3470:
3467:
3463:
3442:
3422:
3419:
3416:
3392:
3368:
3348:
3333:Main article:
3330:
3327:
3306:
3303:
3298:
3294:
3290:
3287:
3284:
3281:
3278:
3275:
3272:
3269:
3266:
3261:
3256:
3253:
3250:
3245:
3240:
3237:
3215:
3210:
3206:
3202:
3199:
3196:
3193:
3190:
3187:
3184:
3178:
3172:
3167:
3164:
3160:
3156:
3153:
3146:
3142:
3138:
3134:
3128:
3124:
3120:
3117:
3114:
3109:
3105:
3097:
3093:
3088:
3084:
3081:
3076:
3072:
3068:
3065:
3062:
3057:
3053:
3047:
3043:
3039:
3036:
3031:
3026:
3023:
3020:
3015:
2987:
2982:
2978:
2974:
2971:
2968:
2965:
2962:
2958:
2952:
2948:
2944:
2941:
2938:
2933:
2929:
2921:
2917:
2912:
2885:
2880:
2876:
2872:
2869:
2866:
2863:
2857:
2851:
2848:
2845:
2825:
2820:
2816:
2812:
2809:
2806:
2803:
2783:
2779:
2773:
2769:
2765:
2743:
2739:
2735:
2715:
2693:
2689:
2668:
2663:
2659:
2655:
2652:
2649:
2644:
2640:
2619:
2593:
2589:
2568:
2548:
2526:
2522:
2507:log-likelihood
2502:
2499:
2486:
2483:
2480:
2477:
2453:
2431:
2427:
2406:
2401:
2397:
2393:
2390:
2370:
2350:
2347:
2344:
2341:
2321:
2299:
2294:
2290:
2286:
2283:
2280:
2275:
2271:
2265:
2262:
2255:
2250:
2247:
2244:
2240:
2236:
2233:
2230:
2227:
2224:
2221:
2218:
2215:
2192:
2172:
2152:
2128:
2116:
2113:
2101:
2098:
2095:
2092:
2089:
2086:
2083:
2080:
2077:
2074:
2071:
2068:
2065:
2060:
2056:
2051:
2048:
2045:
2042:
2037:
2033:
2029:
2026:
2023:
2018:
2014:
2010:
2007:
2004:
1999:
1995:
1990:
1985:
1981:
1977:
1974:
1967:
1963:
1958:
1954:
1951:
1947:
1942:
1939:
1936:
1933:
1929:
1924:
1920:
1915:
1911:
1906:
1902:
1898:
1895:
1891:
1885:
1882:
1879:
1876:
1873:
1867:
1864:
1861:
1858:
1854:
1851:
1845:
1841:
1836:
1832:
1828:
1825:
1822:
1819:
1816:
1813:
1808:
1804:
1781:
1761:
1758:
1738:
1718:
1698:
1676:
1672:
1649:
1645:
1624:
1619:
1615:
1611:
1608:
1605:
1600:
1596:
1592:
1568:
1564:
1558:
1553:
1550:
1545:
1540:
1537:
1532:
1528:
1524:
1521:
1501:
1496:
1492:
1488:
1485:
1482:
1477:
1473:
1469:
1447:
1443:
1422:
1419:
1406:
1386:
1362:
1359:
1356:
1353:
1350:
1347:
1319:
1316:
1312:
1307:
1304:
1301:
1298:
1295:
1292:
1288:
1285:
1282:
1279:
1273:
1268:
1264:
1261:
1258:
1255:
1252:
1249:
1246:
1243:
1233:
1231:and therefore
1218:
1215:
1212:
1209:
1206:
1203:
1200:
1197:
1192:
1188:
1184:
1181:
1177:
1172:
1169:
1166:
1163:
1160:
1157:
1153:
1150:
1147:
1144:
1138:
1133:
1129:
1107:
1087:
1067:
1047:
1027:
996:
976:
949:
929:
897:
894:
891:
888:
885:
882:
879:
875:
872:
869:
866:
859:
854:
851:
847:
843:
840:
837:
834:
831:
828:
825:
822:
812:
797:
783:with the same
772:
752:
725:
701:
698:
695:
692:
670:
667:
664:
661:
658:
655:
649:
646:
641:
637:
634:
631:
628:
625:
622:
619:
616:
613:
610:
607:
604:
582:
562:
538:
518:
498:
495:
492:
489:
486:
480:
477:
472:
445:
434:expected value
421:
418:
415:
410:
406:
383:
380:
377:
374:
371:
368:
365:
360:
356:
352:
349:
346:
343:
340:
337:
334:
331:
309:
289:
277:
274:
261:
241:
217:
197:
173:
172:
170:
169:
162:
155:
147:
144:
143:
142:
141:
136:
131:
126:
118:
117:
116:
115:
110:
102:
101:
100:
99:
94:
89:
84:
79:
74:
69:
64:
59:
54:
46:
45:
37:
36:
26:
24:
14:
13:
10:
9:
6:
4:
3:
2:
7754:
7743:
7740:
7738:
7735:
7734:
7732:
7722:
7718:
7714:
7713:
7709:
7699:
7694:
7687:
7684:
7679:
7673:
7669:
7665:
7661:
7653:
7650:
7645:
7639:
7635:
7628:
7625:
7619:
7614:
7606:
7603:
7600:
7595:
7592:
7588:
7583:
7580:
7577:
7573:
7568:
7565:
7562:
7556:
7553:
7547:
7544:
7538:
7535:
7528:
7524:
7521:
7519:
7516:
7514:
7511:
7509:
7506:
7504:
7501:
7499:
7496:
7495:
7491:
7489:
7475:
7472:
7469:
7449:
7446:
7443:
7423:
7403:
7381:
7378:
7374:
7351:
7347:
7324:
7321:
7317:
7294:
7290:
7264:
7260:
7256:
7251:
7247:
7240:
7235:
7232:
7229:
7225:
7219:
7216:
7211:
7203:
7199:
7195:
7192:
7186:
7183:
7178:
7174:
7153:
7141:
7139:
7136:
7123:
7115:
7111:
7107:
7102:
7092:
7080:
7077:
7073:
7067:
7062:
7059:
7056:
7052:
7048:
7040:
7030:
7023:
7018:
7014:
7005:
7002:
6998:
6992:
6987:
6984:
6981:
6977:
6973:
6970:
6956:
6948:
6944:
6915:
6905:
6901:
6896:
6892:
6885:
6882:
6878:
6872:
6868:
6863:
6853:
6850:
6846:
6840:
6836:
6831:
6825:
6822:
6818:
6814:
6808:
6804:
6793:
6789:
6785:
6780:
6777:
6773:
6767:
6763:
6759:
6755:
6751:
6748:
6744:
6739:
6736:
6732:
6728:
6725:
6717:
6713:
6684:
6674:
6670:
6665:
6661:
6654:
6651:
6647:
6641:
6637:
6632:
6622:
6618:
6613:
6607:
6604:
6600:
6593:
6583:
6579:
6575:
6570:
6567:
6563:
6557:
6553:
6549:
6545:
6541:
6538:
6534:
6529:
6526:
6518:
6514:
6481:
6473:
6469:
6465:
6460:
6450:
6438:
6433:
6430:
6427:
6423:
6419:
6415:
6409:
6399:
6392:
6387:
6383:
6378:
6372:
6367:
6364:
6361:
6357:
6353:
6350:
6348:
6339:
6328:
6324:
6320:
6315:
6311:
6307:
6303:
6299:
6296:
6292:
6282:
6278:
6274:
6271:
6265:
6255:
6251:
6247:
6242:
6238:
6234:
6230:
6226:
6223:
6214:
6210:
6206:
6201:
6197:
6193:
6189:
6185:
6180:
6176:
6168:
6162:
6157:
6154:
6151:
6147:
6143:
6140:
6138:
6122:
6114:
6110:
6077:
6067:
6063:
6059:
6054:
6050:
6046:
6042:
6038:
6035:
6030:
6027:
6021:
6017:
6006:
6002:
5998:
5993:
5989:
5985:
5981:
5977:
5974:
5970:
5965:
5962:
5958:
5954:
5951:
5943:
5939:
5910:
5900:
5896:
5892:
5887:
5883:
5879:
5875:
5871:
5868:
5860:
5856:
5852:
5847:
5843:
5839:
5835:
5829:
5819:
5815:
5811:
5806:
5802:
5798:
5794:
5790:
5787:
5783:
5778:
5775:
5767:
5763:
5745:
5729:
5719:
5706:
5693:
5687:
5684:
5675:
5664:
5660:
5656:
5642:
5617:
5614:
5601:
5597:
5588:
5578:
5571:
5568:
5562:
5559:
5551:
5547:
5543:
5540:
5534:
5529:
5519:
5512:
5509:
5504:
5500:
5495:
5489:
5484:
5481:
5478:
5474:
5470:
5467:
5453:
5434:
5423:
5420:
5416:
5410:
5406:
5402:
5399:
5396:
5391:
5388:
5384:
5378:
5374:
5370:
5365:
5361:
5357:
5351:
5348:
5345:
5342:
5338:
5333:
5325:
5322:
5318:
5314:
5311:
5308:
5303:
5300:
5296:
5283:
5277:
5266:
5262:
5239:
5231:
5228:
5225:
5219:
5216:
5206:
5201:
5193:
5190:
5186:
5180:
5173:
5170:
5166:
5160:
5153:
5147:
5142:
5133:
5130:
5126:
5120:
5113:
5109:
5103:
5094:
5091:
5087:
5081:
5074:
5070:
5064:
5058:
5053:
5042:
5033:
5031:
5027:
5023:
5017:
5000:
4994:
4967:
4957:
4949:
4945:
4941:
4938:
4931:
4927:
4924:
4916:
4906:
4895:
4892:
4887:
4877:
4864:
4851:
4835:
4825:
4818:
4815:
4809:
4806:
4798:
4794:
4790:
4787:
4781:
4776:
4766:
4759:
4756:
4751:
4747:
4731:
4726:
4723:
4720:
4716:
4710:
4707:
4702:
4696:
4685:
4681:
4677:
4672:
4668:
4661:
4656:
4651:
4648:
4645:
4641:
4635:
4632:
4624:
4607:
4599:
4597:
4581:
4578:
4575:
4572:
4569:
4566:
4563:
4543:
4526:
4522:
4521:
4520:
4507:
4495:
4489:
4486:
4480:
4477:
4471:
4468:
4465:
4459:
4450:
4444:
4441:
4438:
4435:
4429:
4421:
4417:
4413:
4410:
4405:
4401:
4395:
4391:
4387:
4381:
4372:
4369:
4366:
4360:
4340:
4320:
4291:
4285:
4282:
4279:
4270:
4261:
4258:
4235:
4232:
4229:
4226:
4223:
4217:
4214:
4205:
4192:
4183:
4177:
4174:
4171:
4166:
4163:
4160:
4156:
4135:
4132:
4129:
4121:
4083:
4070:
4062:
4058:
4054:
4051:
4047:
4042:
4031:
4020:
4017:
4008:
4002:
3997:
3994:
3991:
3987:
3966:
3963:
3960:
3940:
3920:
3895:
3892:
3888:
3884:
3881:
3874:
3870:
3867:
3861:
3855:
3848:
3844:
3828:
3808:
3788:
3780:
3775:
3773:
3772:
3771:logistic loss
3767:
3763:
3745:
3741:
3718:
3714:
3705:
3701:
3693:
3691:
3674:
3671:
3668:
3662:
3654:
3635:
3632:
3629:
3623:
3600:
3597:
3594:
3577:
3568:
3552:
3532:
3512:
3504:
3503:
3497:
3495:
3491:
3487:
3468:
3440:
3420:
3417:
3414:
3406:
3390:
3382:
3381:KL divergence
3366:
3346:
3336:
3328:
3326:
3324:
3320:
3304:
3296:
3292:
3288:
3285:
3279:
3276:
3273:
3270:
3267:
3254:
3251:
3238:
3235:
3208:
3204:
3200:
3197:
3191:
3188:
3185:
3182:
3170:
3165:
3162:
3158:
3154:
3151:
3144:
3140:
3126:
3122:
3118:
3115:
3107:
3103:
3095:
3091:
3086:
3082:
3074:
3070:
3066:
3063:
3055:
3051:
3045:
3041:
3037:
3024:
3021:
3003:
2980:
2976:
2972:
2969:
2963:
2960:
2950:
2946:
2942:
2939:
2931:
2927:
2919:
2915:
2910:
2901:
2878:
2874:
2870:
2867:
2861:
2849:
2846:
2843:
2818:
2814:
2810:
2807:
2801:
2781:
2777:
2771:
2767:
2741:
2737:
2713:
2691:
2687:
2661:
2657:
2653:
2650:
2642:
2638:
2617:
2609:
2591:
2587:
2566:
2546:
2524:
2520:
2510:
2508:
2500:
2498:
2481:
2475:
2467:
2451:
2429:
2425:
2399:
2395:
2388:
2368:
2345:
2339:
2319:
2310:
2292:
2288:
2281:
2278:
2273:
2269:
2263:
2260:
2253:
2248:
2245:
2242:
2238:
2234:
2231:
2225:
2222:
2219:
2213:
2205:
2190:
2170:
2150:
2142:
2126:
2114:
2112:
2099:
2093:
2090:
2087:
2081:
2078:
2072:
2066:
2063:
2058:
2054:
2046:
2040:
2035:
2031:
2027:
2024:
2016:
2012:
2005:
2002:
1997:
1993:
1983:
1979:
1972:
1965:
1961:
1956:
1952:
1949:
1945:
1937:
1931:
1927:
1922:
1918:
1913:
1909:
1904:
1896:
1893:
1889:
1880:
1874:
1871:
1862:
1856:
1852:
1849:
1843:
1839:
1834:
1826:
1823:
1817:
1811:
1806:
1793:
1779:
1759:
1756:
1736:
1716:
1696:
1674:
1670:
1647:
1643:
1617:
1613:
1609:
1606:
1603:
1598:
1594:
1566:
1562:
1556:
1551:
1548:
1543:
1538:
1530:
1526:
1519:
1494:
1490:
1486:
1483:
1480:
1475:
1471:
1445:
1441:
1432:
1428:
1420:
1418:
1404:
1384:
1376:
1375:joint entropy
1357:
1354:
1351:
1345:
1335:
1317:
1314:
1302:
1296:
1293:
1290:
1283:
1277:
1266:
1262:
1259:
1253:
1250:
1247:
1241:
1232:
1229:
1216:
1210:
1207:
1204:
1201:
1195:
1190:
1182:
1179:
1167:
1161:
1158:
1155:
1148:
1142:
1131:
1127:
1119:
1105:
1085:
1065:
1045:
1025:
1017:
1014:
1010:
994:
974:
967:
963:
947:
927:
919:
913:
895:
889:
883:
880:
877:
870:
864:
852:
849:
845:
841:
838:
832:
829:
826:
820:
811:
810:, this means
786:
770:
750:
742:
737:
723:
715:
696:
690:
681:
668:
662:
659:
656:
639:
635:
629:
623:
620:
614:
611:
608:
602:
594:
580:
560:
552:
536:
516:
493:
490:
487:
470:
462:
457:
443:
435:
416:
408:
404:
394:
381:
375:
372:
369:
363:
358:
350:
347:
341:
338:
335:
329:
321:
307:
287:
275:
273:
259:
239:
231:
215:
195:
188:
184:
183:cross-entropy
180:
168:
163:
161:
156:
154:
149:
148:
146:
145:
140:
137:
135:
132:
130:
127:
125:
122:
121:
120:
119:
114:
111:
109:
106:
105:
104:
103:
98:
95:
93:
90:
88:
85:
83:
80:
78:
75:
73:
70:
68:
67:Joint entropy
65:
63:
60:
58:
55:
53:
50:
49:
48:
47:
43:
39:
38:
35:
31:
19:
7720:
7686:
7659:
7652:
7633:
7627:
7605:
7594:
7586:
7582:
7571:
7567:
7555:
7546:
7537:
7396:classifier,
7339:classifier,
7145:
7137:
5746:
5707:
5618:
5615:
5034:
5025:
5024:
5018:
4865:
4600:
4595:
4534:
4206:
3979:is given by
3776:
3769:
3765:
3761:
3704:optimization
3697:
3566:
3500:
3498:
3493:
3489:
3338:
2511:
2504:
2311:
2206:
2118:
1794:
1424:
1337:
1234:
1230:
1120:
915:
813:
738:
682:
595:
550:
458:
395:
322:
279:
185:between two
182:
176:
92:Entropy rate
3433:, which is
7731:Categories
7698:2007.08140
7618:2303.09935
7529:References
5744:, we have
4529:incorrect.
3492:(MCE), or
2900:perplexity
2509:function.
2115:Estimation
1421:Motivation
918:continuous
276:Definition
7470:λ
7444:λ
7424:λ
7233:≠
7226:∑
7217:λ
7212:−
7108:−
7096:^
7053:∑
7034:^
7024:−
6978:∑
6974:−
6964:β
6945:β
6941:∂
6937:∂
6869:β
6837:β
6815:−
6764:β
6760:−
6740:−
6729:
6714:β
6710:∂
6706:∂
6638:β
6554:β
6550:−
6530:
6515:β
6511:∂
6507:∂
6466:−
6454:^
6424:∑
6403:^
6393:−
6358:∑
6354:−
6312:β
6308:−
6275:−
6266:−
6239:β
6235:−
6198:β
6194:−
6186:⋅
6148:∑
6144:−
6130:β
6111:β
6107:∂
6103:∂
6051:β
6047:−
6028:−
5990:β
5986:−
5966:−
5955:
5940:β
5936:∂
5932:∂
5884:β
5880:−
5844:β
5840:−
5803:β
5799:−
5779:
5764:β
5760:∂
5756:∂
5723:^
5685:−
5679:^
5650:β
5636:β
5632:∂
5628:∂
5582:^
5572:−
5563:
5544:−
5523:^
5513:
5475:∑
5471:−
5461:β
5407:β
5403:−
5400:⋯
5397:−
5375:β
5371:−
5362:β
5358:−
5352:
5312:…
5287:^
5272:^
5220:×
5207:∈
5181:⋯
5154:⋮
5148:⋮
5143:⋮
5121:⋯
5082:…
4958:⋅
4950:−
4907:⋅
4893:≡
4881:^
4829:^
4819:−
4810:
4791:−
4770:^
4760:
4717:∑
4703:−
4642:∑
4576:…
4499:^
4490:−
4481:
4469:−
4460:−
4454:^
4445:
4436:−
4414:
4392:∑
4388:−
4295:^
4286:−
4274:^
4262:∈
4233:−
4218:∈
4187:^
4178:−
4071:⋅
4063:−
4032:⋅
4018:≡
4012:^
3893:−
3598:∥
3297:θ
3277:⋅
3271:−
3252:θ
3239:
3209:θ
3189:⋅
3183:−
3163:−
3137:#
3108:θ
3087:∏
3056:θ
3042:∏
3022:θ
2961:−
2932:θ
2911:∏
2879:θ
2764:#
2734:#
2643:θ
2618:θ
2547:θ
2525:θ
2279:
2239:∑
2235:−
2064:
2032:∑
2028:−
2003:
1957:∑
1953:−
1928:
1910:
1897:−
1875:
1853:
1840:
1827:−
1818:ℓ
1812:
1644:ℓ
1607:…
1563:ℓ
1484:…
1294:
1267:∫
1263:−
1208:
1202:−
1196:
1159:
1132:∫
1128:−
1016:Ļ-algebra
987:(usually
881:
853:∈
846:∑
842:−
660:∥
491:∥
417:⋅
373:
364:
351:−
7492:See also
3762:log loss
1749:and not
1635:, where
741:discrete
7636:. MIT.
5026:Remark:
4987:, with
4596:average
3494:Minxent
3000:by the
2579:values
2539:, with
1118:. Then
1018:). Let
966:measure
785:support
714:entropy
712:is the
432:is the
52:Entropy
7674:
7640:
7282:where
4866:where
4737:
4700:
4694:
4628:
4622:
4594:. The
4433:
4427:
4385:
4379:
3913:where
2312:where
1429:, the
683:where
396:where
181:, the
7693:arXiv
7613:arXiv
7576:p. 82
1583:over
1013:Borel
1011:on a
1007:is a
529:from
7672:ISBN
7638:ISBN
4333:and
4251:and
3801:and
3764:(or
3702:and
2898:the
1397:and
1332:Eq.2
1078:and
1038:and
960:are
940:and
910:Eq.1
763:and
739:For
230:bits
208:and
7664:doi
5560:log
5510:log
5349:exp
4807:log
4757:log
4478:log
4442:log
4411:log
3768:or
3567:not
3236:log
3228:or
2270:log
2055:log
1994:log
1919:log
1792:is
1425:In
1377:of
1291:log
1205:log
1156:log
878:log
716:of
593:).
553:of
370:log
177:In
7733::
7719:.
7670:.
6726:ln
6527:ln
5952:ln
5776:ln
5114:21
5075:11
4353::
3496:.
2850::=
2497:.
1872:ln
1850:ln
1417:.
1334:)
912:)
736:.
456:.
272:.
7701:.
7695::
7680:.
7666::
7646:.
7621:.
7615::
7476:1
7473:=
7450:0
7447:=
7404:p
7382:h
7379:t
7375:k
7352:k
7348:q
7325:h
7322:t
7318:k
7295:k
7291:e
7270:)
7265:k
7261:q
7257:,
7252:j
7248:q
7244:(
7241:H
7236:k
7230:j
7220:K
7209:)
7204:k
7200:q
7196:,
7193:p
7190:(
7187:H
7184:=
7179:k
7175:e
7154:K
7124:.
7121:)
7116:i
7112:y
7103:i
7093:y
7086:(
7081:1
7078:i
7074:x
7068:N
7063:1
7060:=
7057:i
7049:=
7046:)
7041:i
7031:y
7019:i
7015:y
7011:(
7006:1
7003:i
6999:x
6993:N
6988:1
6985:=
6982:i
6971:=
6968:)
6960:(
6957:L
6949:1
6916:,
6906:1
6902:k
6897:e
6893:+
6886:1
6883:i
6879:x
6873:1
6864:e
6854:1
6851:i
6847:x
6841:1
6832:e
6826:1
6823:i
6819:x
6809:=
6805:]
6794:1
6790:k
6786:+
6781:1
6778:i
6774:x
6768:1
6756:e
6752:+
6749:1
6745:1
6737:1
6733:[
6718:1
6685:,
6675:1
6671:k
6666:e
6662:+
6655:1
6652:i
6648:x
6642:1
6633:e
6623:1
6619:k
6614:e
6608:1
6605:i
6601:x
6594:=
6584:1
6580:k
6576:+
6571:1
6568:i
6564:x
6558:1
6546:e
6542:+
6539:1
6535:1
6519:1
6482:,
6479:)
6474:i
6470:y
6461:i
6451:y
6444:(
6439:N
6434:1
6431:=
6428:i
6420:=
6416:]
6410:i
6400:y
6388:i
6384:y
6379:[
6373:N
6368:1
6365:=
6362:i
6351:=
6340:]
6329:0
6325:k
6321:+
6316:0
6304:e
6300:+
6297:1
6293:1
6288:)
6283:i
6279:y
6272:1
6269:(
6256:0
6252:k
6248:+
6243:0
6231:e
6227:+
6224:1
6215:0
6211:k
6207:+
6202:0
6190:e
6181:i
6177:y
6169:[
6163:N
6158:1
6155:=
6152:i
6141:=
6134:)
6126:(
6123:L
6115:0
6078:,
6068:0
6064:k
6060:+
6055:0
6043:e
6039:+
6036:1
6031:1
6022:=
6018:)
6007:0
6003:k
5999:+
5994:0
5982:e
5978:+
5975:1
5971:1
5963:1
5959:(
5944:0
5911:,
5901:0
5897:k
5893:+
5888:0
5876:e
5872:+
5869:1
5861:0
5857:k
5853:+
5848:0
5836:e
5830:=
5820:0
5816:k
5812:+
5807:0
5795:e
5791:+
5788:1
5784:1
5768:0
5730:i
5720:y
5694:.
5691:)
5688:Y
5676:Y
5670:(
5665:T
5661:X
5657:=
5654:)
5646:(
5643:L
5602:.
5598:]
5594:)
5589:i
5579:y
5569:1
5566:(
5557:)
5552:i
5548:y
5541:1
5538:(
5535:+
5530:i
5520:y
5505:i
5501:y
5496:[
5490:N
5485:1
5482:=
5479:i
5468:=
5465:)
5457:(
5454:L
5435:,
5429:)
5424:p
5421:i
5417:x
5411:p
5392:1
5389:i
5385:x
5379:1
5366:0
5355:(
5346:+
5343:1
5339:1
5334:=
5331:)
5326:p
5323:i
5319:x
5315:,
5309:,
5304:1
5301:i
5297:x
5293:(
5284:f
5278:=
5267:i
5263:y
5240:,
5235:)
5232:1
5229:+
5226:p
5223:(
5217:n
5212:R
5202:)
5194:p
5191:n
5187:x
5174:1
5171:n
5167:x
5161:1
5134:p
5131:2
5127:x
5110:x
5104:1
5095:p
5092:1
5088:x
5071:x
5065:1
5059:(
5054:=
5048:T
5043:X
5004:)
5001:z
4998:(
4995:g
4975:)
4968:n
4963:x
4954:w
4946:e
4942:+
4939:1
4936:(
4932:/
4928:1
4925:=
4922:)
4917:n
4912:x
4903:w
4899:(
4896:g
4888:n
4878:y
4852:,
4846:]
4841:)
4836:n
4826:y
4816:1
4813:(
4804:)
4799:n
4795:y
4788:1
4785:(
4782:+
4777:n
4767:y
4752:n
4748:y
4742:[
4732:N
4727:1
4724:=
4721:n
4711:N
4708:1
4697:=
4691:)
4686:n
4682:q
4678:,
4673:n
4669:p
4665:(
4662:H
4657:N
4652:1
4649:=
4646:n
4636:N
4633:1
4625:=
4619:)
4615:w
4611:(
4608:J
4582:N
4579:,
4573:,
4570:1
4567:=
4564:n
4544:N
4508:.
4505:)
4496:y
4487:1
4484:(
4475:)
4472:y
4466:1
4463:(
4451:y
4439:y
4430:=
4422:i
4418:q
4406:i
4402:p
4396:i
4382:=
4376:)
4373:q
4370:,
4367:p
4364:(
4361:H
4341:q
4321:p
4301:}
4292:y
4283:1
4280:,
4271:y
4265:{
4259:q
4239:}
4236:y
4230:1
4227:,
4224:y
4221:{
4215:p
4193:.
4184:y
4175:1
4172:=
4167:0
4164:=
4161:y
4157:q
4136:0
4133:=
4130:y
4105:w
4084:,
4075:x
4067:w
4059:e
4055:+
4052:1
4048:1
4043:=
4040:)
4036:x
4028:w
4024:(
4021:g
4009:y
4003:=
3998:1
3995:=
3992:y
3988:q
3967:1
3964:=
3961:y
3941:x
3921:z
3901:)
3896:z
3889:e
3885:+
3882:1
3879:(
3875:/
3871:1
3868:=
3865:)
3862:z
3859:(
3856:g
3829:x
3809:1
3789:0
3746:i
3742:q
3719:i
3715:p
3678:)
3675:q
3672:,
3669:p
3666:(
3663:H
3639:)
3636:q
3633:,
3630:p
3627:(
3624:H
3604:)
3601:q
3595:p
3592:(
3586:L
3583:K
3578:D
3553:q
3533:p
3513:q
3472:)
3469:p
3466:(
3462:H
3441:0
3421:q
3418:=
3415:p
3391:p
3367:p
3347:q
3305:.
3302:)
3293:q
3289:,
3286:p
3283:(
3280:H
3274:N
3268:=
3265:)
3260:x
3255:;
3249:(
3244:L
3214:)
3205:q
3201:,
3198:p
3195:(
3192:H
3186:N
3177:e
3171:=
3166:N
3159:P
3155:P
3152:=
3145:i
3141:x
3133:)
3127:i
3123:x
3119:=
3116:X
3113:(
3104:q
3096:i
3092:x
3083:=
3080:)
3075:i
3071:x
3067:=
3064:X
3061:(
3052:q
3046:i
3038:=
3035:)
3030:x
3025:;
3019:(
3014:L
2986:)
2981:i
2977:x
2973:=
2970:X
2967:(
2964:p
2957:)
2951:i
2947:x
2943:=
2940:X
2937:(
2928:q
2920:i
2916:x
2884:)
2875:q
2871:,
2868:p
2865:(
2862:H
2856:e
2847:P
2844:P
2824:)
2819:i
2815:x
2811:=
2808:X
2805:(
2802:p
2782:N
2778:/
2772:i
2768:x
2742:i
2738:x
2714:i
2692:i
2688:x
2667:)
2662:i
2658:x
2654:=
2651:X
2648:(
2639:q
2592:i
2588:x
2567:N
2521:q
2485:)
2482:x
2479:(
2476:p
2452:N
2430:i
2426:x
2405:)
2400:i
2396:x
2392:(
2389:q
2369:x
2349:)
2346:x
2343:(
2340:q
2320:N
2298:)
2293:i
2289:x
2285:(
2282:q
2274:2
2264:N
2261:1
2254:N
2249:1
2246:=
2243:i
2232:=
2229:)
2226:q
2223:,
2220:T
2217:(
2214:H
2191:q
2171:p
2151:T
2127:p
2100:.
2097:)
2094:q
2091:,
2088:p
2085:(
2082:H
2079:=
2076:)
2073:x
2070:(
2067:q
2059:2
2050:)
2047:x
2044:(
2041:p
2036:x
2025:=
2022:)
2017:i
2013:x
2009:(
2006:q
1998:2
1989:)
1984:i
1980:x
1976:(
1973:p
1966:i
1962:x
1950:=
1946:]
1941:)
1938:x
1935:(
1932:q
1923:2
1914:[
1905:p
1901:E
1894:=
1890:]
1884:)
1881:2
1878:(
1866:)
1863:x
1860:(
1857:q
1844:[
1835:p
1831:E
1824:=
1821:]
1815:[
1807:p
1803:E
1780:p
1760:.
1757:q
1737:p
1717:p
1697:q
1675:i
1671:x
1648:i
1623:}
1618:n
1614:x
1610:,
1604:,
1599:1
1595:x
1591:{
1567:i
1557:)
1552:2
1549:1
1544:(
1539:=
1536:)
1531:i
1527:x
1523:(
1520:q
1500:}
1495:n
1491:x
1487:,
1481:,
1476:1
1472:x
1468:{
1446:i
1442:x
1405:q
1385:p
1361:)
1358:q
1355:,
1352:p
1349:(
1346:H
1318:.
1315:x
1311:d
1306:)
1303:x
1300:(
1297:Q
1287:)
1284:x
1281:(
1278:P
1272:X
1260:=
1257:)
1254:q
1251:,
1248:p
1245:(
1242:H
1217:,
1214:]
1211:Q
1199:[
1191:p
1187:E
1183:=
1180:x
1176:d
1171:)
1168:x
1165:(
1162:Q
1152:)
1149:x
1146:(
1143:P
1137:X
1106:r
1086:q
1066:p
1046:Q
1026:P
995:r
975:r
948:q
928:p
896:.
893:)
890:x
887:(
884:q
874:)
871:x
868:(
865:p
858:X
850:x
839:=
836:)
833:q
830:,
827:p
824:(
821:H
796:X
771:q
751:p
724:p
700:)
697:p
694:(
691:H
669:,
666:)
663:q
657:p
654:(
648:L
645:K
640:D
636:+
633:)
630:p
627:(
624:H
621:=
618:)
615:q
612:,
609:p
606:(
603:H
581:q
561:p
537:q
517:p
497:)
494:q
488:p
485:(
479:L
476:K
471:D
444:p
420:]
414:[
409:p
405:E
382:,
379:]
376:q
367:[
359:p
355:E
348:=
345:)
342:q
339:,
336:p
333:(
330:H
308:p
288:q
260:p
240:q
216:q
196:p
166:e
159:t
152:v
20:)
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.