Knowledge

Cross-entropy

Source šŸ“

4525: 42: 5250: 6496: 5037: 3226: 2110: 7134: 6695: 4535:
Logistic regression typically optimizes the log loss for all the observations on which it is trained, which is the same as optimizing the average cross-entropy in the sample. Other loss functions that penalize errors differently can be also used for training, resulting in models with different final
5921: 6088: 6092: 5445: 6926: 4862: 4094: 3007: 4985: 4528:
Plot shows different loss functions that can be used to train a binary classifier. Only the case where the target output is 1 is shown. It is observed that the loss is zero when the target is equal to the output and increases as the output becomes increasingly
1797: 6930: 5704: 6500: 3706:. Mao, Mohri, and Zhong (2023) give an extensive analysis of the properties of the family of cross-entropy loss functions in machine learning, including theoretical learning guarantees and extensions to adversarial learning. The true probability 5245:{\displaystyle X^{\mathsf {T}}={\begin{pmatrix}1&x_{11}&\dots &x_{1p}\\1&x_{21}&\cdots &x_{2p}\\\vdots &\vdots &&\vdots \\1&x_{n1}&\cdots &x_{np}\\\end{pmatrix}}\in \mathbb {R} ^{n\times (p+1)},} 4518: 5749: 5925: 7610:
Noel, Mathew; Banerjee, Arindam; D, Geraldine Bessie Amali; Muthiah-Nakarajan, Venkataraman (March 17, 2023). "Alternate loss functions for classification and robust regression can improve the accuracy of artificial neural networks".
1227: 2203:
is the distribution of words as predicted by the model. Since the true distribution is unknown, cross-entropy cannot be directly calculated. In these cases, an estimate of cross-entropy is calculated using the following formula:
3315: 7280: 7657:
Shoham, Ron; Permuter, Haim H. (2019). "Amended Cross-Entropy Cost: An Approach for Encouraging Diversity in Classification Ensemble (Brief Announcement)". In Dolev, Shlomi; Hendler, Danny; Lodha, Sachin; Yung, Moti (eds.).
5254: 6699: 7589:, Jason Brownlee, 2019, p. 220: "Logistic loss refers to the loss function commonly used to optimize a logistic regression model. It may also be referred to as logarithmic loss (which is confusing) or simply log loss." 1328: 4603: 2308: 3982: 906: 5612: 1581: 6491:{\displaystyle {\begin{aligned}{\frac {\partial }{\partial \beta _{0}}}L({\boldsymbol {\beta }})&=-\sum _{i=1}^{N}\left\\&=-\sum _{i=1}^{N}\left=\sum _{i=1}^{N}({\hat {y}}_{i}-y_{i}),\end{aligned}}} 2998: 679: 5621: 2896: 6097: 4311: 4869: 3614: 507: 392: 4356: 1633: 1510: 4203: 3221:{\displaystyle {\mathcal {L}}(\theta ;{\mathbf {x} })=\prod _{i}q_{\theta }(X=x_{i})=\prod _{x_{i}}q_{\theta }(X=x_{i})^{\#x_{i}}=PP^{-N}={\mathrm {e} }^{-N\cdot H(p,q_{\theta })}} 2677: 2512:
The section is concerned with the subject of estimation of the probability of different possible discrete outcomes. To this end, denote a parametrized family of distributions by
7146:
It may be beneficial to train an ensemble of models that have diversity, such that when they are combined, their predictive accuracy is augmented. Assuming a simple ensemble of
1123: 3911: 2105:{\displaystyle \operatorname {E} _{p}=-\operatorname {E} _{p}\left=-\operatorname {E} _{p}\left=-\sum _{x_{i}}p(x_{i})\,\log _{2}q(x_{i})=-\sum _{x}p(x)\,\log _{2}q(x)=H(p,q).} 3482: 5742: 4249: 808: 7129:{\displaystyle {\frac {\partial }{\partial \beta _{1}}}L({\boldsymbol {\beta }})=-\sum _{i=1}^{N}x_{i1}(y_{i}-{\hat {y}}_{i})=\sum _{i=1}^{N}x_{i1}({\hat {y}}_{i}-y_{i}).} 4592: 4116: 3231: 2537: 7486: 7460: 7169: 2834: 2792: 1660: 7434: 2415: 2754: 2628: 2557: 7394: 7337: 3688: 3649: 1371: 7364: 7307: 3758: 3731: 2704: 2604: 2442: 1687: 1458: 430: 5014: 2495: 2359: 710: 4146: 3977: 3431: 6690:{\displaystyle {\frac {\partial }{\partial \beta _{1}}}\ln {\frac {1}{1+e^{-\beta _{1}x_{i1}+k_{1}}}}={\frac {x_{i1}e^{k_{1}}}{e^{\beta _{1}x_{i1}}+e^{k_{1}}}},} 1770: 7414: 7164: 4554: 4351: 4331: 3951: 3931: 3839: 3819: 3799: 3563: 3543: 3523: 3451: 3401: 3377: 3357: 2724: 2577: 2462: 2379: 2330: 2201: 2181: 2161: 2137: 1790: 1747: 1727: 1707: 1415: 1395: 1116: 1096: 1076: 1056: 1036: 1005: 985: 958: 938: 781: 761: 734: 591: 571: 547: 527: 454: 318: 298: 270: 250: 226: 206: 164: 2209: 5916:{\displaystyle {\frac {\partial }{\partial \beta _{0}}}\ln {\frac {1}{1+e^{-\beta _{0}+k_{0}}}}={\frac {e^{-\beta _{0}+k_{0}}}{1+e^{-\beta _{0}+k_{0}}}},} 3569:
equivalent. This has led to some ambiguity in the literature, with some authors attempting to resolve the inconsistency by restating cross-entropy to be
6083:{\displaystyle {\frac {\partial }{\partial \beta _{0}}}\ln \left(1-{\frac {1}{1+e^{-\beta _{0}+k_{0}}}}\right)={\frac {-1}{1+e^{-\beta _{0}+k_{0}}}},} 5449: 598: 1237: 7660:
Cyber Security Cryptography and Machine Learning ā€“ Third International Symposium, CSCML 2019, Beer-Sheva, Israel, June 27ā€“28, 2019, Proceedings
232:
needed to identify an event drawn from the set when the coding scheme used for the set is optimized for an estimated probability distribution
7675: 7575: 7550:
I. J. Good, Maximum entropy for hypothesis formulation, especially for multidimensional contingency tables, Ann. of Math. Statistics, 1963
5440:{\displaystyle {\hat {y_{i}}}={\hat {f}}(x_{i1},\dots ,x_{ip})={\frac {1}{1+\exp(-\beta _{0}-\beta _{1}x_{i1}-\dots -\beta _{p}x_{ip})}},} 2163:, and then its cross-entropy is measured on a test set to assess how accurate the model is in predicting the test data. In this example, 816: 6921:{\displaystyle {\frac {\partial }{\partial \beta _{1}}}\ln \left={\frac {-x_{i1}e^{\beta _{1}x_{i1}}}{e^{\beta _{1}x_{i1}}+e^{k_{1}}}},} 96: 7641: 1515: 123: 325: 107: 3339:
Cross-entropy minimization is frequently used in optimization and rare-event probability estimation. When comparing a distribution
4524: 2905: 7736: 157: 3501: 3485: 3380: 2836:, as it may be understood as empirical approximation to the probability distribution underlying the scenario. Further denote by 2679:. Repeated occurrences are possible, leading to equal factors in the product. If the count of occurrences of the value equal to 460: 4857:{\displaystyle J(\mathbf {w} )\ =\ {\frac {1}{N}}\sum _{n=1}^{N}H(p_{n},q_{n})\ =\ -{\frac {1}{N}}\sum _{n=1}^{N}\ {\bigg }\,,} 4089:{\displaystyle q_{y=1}={\hat {y}}\equiv g(\mathbf {w} \cdot \mathbf {x} )={\frac {1}{1+e^{-\mathbf {w} \cdot \mathbf {x} }}},} 3001: 81: 7512: 4151: 2839: 7517: 3322: 5028:
The gradient of the cross-entropy loss for logistic regression is the same as the gradient of the squared-error loss for
4980:{\displaystyle {\hat {y}}_{n}\equiv g(\mathbf {w} \cdot \mathbf {x} _{n})=1/(1+e^{-\mathbf {w} \cdot \mathbf {x} _{n}})} 1689:
in bits. Therefore, cross-entropy can be interpreted as the expected message-length per datum when a wrong distribution
133: 51: 138: 7559:
Anqi Mao, Mehryar Mohri, Yutao Zhong. Cross-entropy loss functions: Theoretical analysis and applications. ICML 2023.
4254: 150: 7436:
is a parameter between 0 and 1 that defines the 'diversity' that we would like to establish among the ensemble. When
3572: 465: 112: 917: 7691:
Shoham, Ron; Permuter, Haim (2020). "Amended Cross Entropy Cost: Framework For Explicit Diversity Encouragement".
7598: 5699:{\displaystyle {\frac {\partial }{\partial {\boldsymbol {\beta }}}}L({\boldsymbol {\beta }})=X^{T}({\hat {Y}}-Y).} 2607: 740: 186: 1586: 1463: 7741: 2633: 3484:
for cross-entropy. In the engineering literature, the principle of minimizing KL divergence (Kullback's "
965: 961: 784: 3851: 1430: 7497: 3841:, can be interpreted as a probability, which serves as the basis for classifying the observation. In 3456: 3404: 3334: 2505:
The cross entropy arises in classification problems when introducing a logarithm in the guise of the
76: 56: 5711: 4210: 789: 7507: 7502: 4513:{\displaystyle H(p,q)\ =\ -\sum _{i}p_{i}\log q_{i}\ =\ -y\log {\hat {y}}-(1-y)\log(1-{\hat {y}}).} 3842: 2506: 713: 61: 4559: 4099: 3781:
model which can be used to classify observations into two possible classes (often simply labelled
7692: 7612: 7522: 3318: 2515: 2465: 1433:
establishes that any directly decodable coding scheme for coding a message to identify one value
1426: 178: 71: 33: 7465: 7439: 2797: 7166:
classifiers is assembled via averaging the outputs, then the amended cross-entropy is given by
2759: 1638: 7716: 7671: 7637: 7419: 5029: 3846: 3778: 2384: 2729: 2613: 2542: 7663: 7369: 7312: 4119: 3699: 3658: 3652: 3619: 1341: 1008: 128: 86: 7342: 7285: 3736: 3709: 2682: 2582: 2420: 2119:
There are many situations where cross-entropy needs to be measured but the distribution of
1665: 1436: 1222:{\displaystyle -\int _{\mathcal {X}}P(x)\,\log Q(x)\,\mathrm {d} x=\operatorname {E} _{p},} 399: 4990: 2471: 2335: 686: 7541:
Thomas M. Cover, Joy A. Thomas, Elements of Information Theory, 2nd Edition, Wiley, p. 80
4125: 3956: 3410: 1752: 7399: 7149: 5020:
The logistic loss is sometimes called cross-entropy loss. It is also known as log loss.
4539: 4336: 4316: 3936: 3916: 3824: 3804: 3784: 3548: 3528: 3508: 3436: 3386: 3362: 3342: 2709: 2562: 2447: 2364: 2315: 2186: 2166: 2146: 2140: 2122: 1775: 1732: 1712: 1692: 1400: 1380: 1101: 1081: 1061: 1041: 1021: 990: 970: 943: 923: 766: 746: 719: 576: 556: 532: 512: 439: 433: 303: 283: 255: 235: 211: 191: 7730: 3821:). The output of the model for a given observation, given a vector of input features 3770: 3310:{\displaystyle \log {\mathcal {L}}(\theta ;{\mathbf {x} })=-N\cdot H(p,q_{\theta }).} 1374: 1015: 66: 41: 7275:{\displaystyle e^{k}=H(p,q^{k})-{\frac {\lambda }{K}}\sum _{j\neq k}H(q^{j},q^{k})} 3703: 91: 7662:. Lecture Notes in Computer Science. Vol. 11527. Springer. pp. 202ā€“207. 17: 7667: 7587:
Probability for Machine Learning: Discover How To Harness Uncertainty With Python
3565:
as possible, subject to some constraint. In this case the two minimizations are
2899: 1729:. That is why the expectation is taken over the true probability distribution 1012: 7462:
we want each classifier to do its best regardless of the ensemble and when
3774:); the terms "log loss" and "cross-entropy loss" are used interchangeably. 2417:
is the probability estimate of the model that the i-th word of the text is
1323:{\displaystyle H(p,q)=-\int _{\mathcal {X}}P(x)\,\log Q(x)\,\mathrm {d} x.} 2468:
of the true cross-entropy, where the test set is treated as samples from
228:, over the same underlying set of events, measures the average number of 2559:
subject to the optimization effort. Consider a given finite sequence of
7560: 3760:
is the predicted value of the current model. This is also known as the
3004:, and where the product is over the values without double counting. So 2303:{\displaystyle H(T,q)=-\sum _{i=1}^{N}{\frac {1}{N}}\log _{2}q(x_{i})} 4313:, we can use cross-entropy to get a measure of dissimilarity between 7697: 7617: 901:{\displaystyle H(p,q)=-\sum _{x\in {\mathcal {X}}}p(x)\,\log q(x).} 7572:
The Mathematics of Information Coding, Extraction and Distribution
4122:. Similarly, the complementary probability of finding the output 3953:, commonly just a linear function. The probability of the output 2630:
of the model is then given by the product over all probabilities
1512:
can be seen as representing an implicit probability distribution
5607:{\displaystyle L({\boldsymbol {\beta }})=-\sum _{i=1}^{N}\left.} 3525:
is the fixed prior reference distribution, and the distribution
1576:{\displaystyle q(x_{i})=\left({\frac {1}{2}}\right)^{\ell _{i}}} 7715:
de Boer, Kroese, D.P., Mannor, S. and Rubinstein, R.Y. (2005).
1772:
Indeed the expected message-length under the true distribution
7574:, by George Cybenko, Dianne P. O'Leary, Jorma Rissanen, 1999, 5022:(In this case, the binary label is often denoted by {āˆ’1,+1}.) 2610:
sampling. The likelihood assigned to any considered parameter
229: 2993:{\textstyle \prod _{x_{i}}q_{\theta }(X=x_{i})^{-p(X=x_{i})}} 3243: 3013: 1271: 1136: 857: 795: 674:{\displaystyle H(p,q)=H(p)+D_{\mathrm {KL} }(p\parallel q),} 7488:
we would like the classifier to be as diverse as possible.
7138:
In a similar way, we eventually obtain the desired result.
3690:
does not agree with the literature and can be misleading.
1709:
is assumed while the data actually follows a distribution
4118:
is optimized through some appropriate algorithm such as
3321:, it does not affect extremization. So observe that the 3698:
Cross-entropy can be used to define a loss function in
5061: 2908: 7468: 7442: 7422: 7402: 7372: 7345: 7315: 7288: 7172: 7152: 6933: 6702: 6503: 6095: 5928: 5752: 5714: 5624: 5452: 5257: 5040: 4993: 4872: 4606: 4562: 4542: 4359: 4339: 4319: 4257: 4213: 4154: 4128: 4102: 3985: 3959: 3939: 3919: 3854: 3827: 3807: 3787: 3739: 3712: 3661: 3655:; see Cover and Thomas and Good. On the other hand, 3622: 3575: 3551: 3531: 3511: 3459: 3439: 3413: 3389: 3365: 3345: 3234: 3010: 2891:{\displaystyle PP:={\mathrm {e} }^{H(p,q_{\theta })}} 2842: 2800: 2762: 2732: 2712: 2685: 2636: 2616: 2585: 2565: 2545: 2518: 2474: 2450: 2423: 2387: 2367: 2338: 2318: 2212: 2189: 2183:
is the true distribution of words in any corpus, and
2169: 2149: 2125: 1800: 1778: 1755: 1735: 1715: 1695: 1668: 1641: 1589: 1518: 1466: 1439: 1403: 1383: 1344: 1240: 1126: 1104: 1084: 1064: 1044: 1024: 993: 973: 946: 926: 819: 792: 769: 749: 722: 689: 601: 579: 559: 535: 515: 468: 442: 402: 328: 306: 286: 258: 238: 214: 194: 3694:
Cross-entropy loss function and logistic regression
2143:, where a model is created based on a training set 920:distributions is analogous. We have to assume that 7480: 7454: 7428: 7408: 7388: 7358: 7331: 7301: 7274: 7158: 7128: 6920: 6689: 6490: 6082: 5915: 5736: 5698: 5606: 5439: 5244: 5008: 4979: 4856: 4586: 4548: 4512: 4345: 4325: 4305: 4243: 4197: 4140: 4110: 4088: 3971: 3945: 3925: 3905: 3833: 3813: 3793: 3752: 3725: 3682: 3643: 3608: 3557: 3537: 3517: 3476: 3445: 3425: 3395: 3371: 3351: 3309: 3220: 2992: 2890: 2828: 2786: 2748: 2718: 2698: 2671: 2622: 2598: 2571: 2551: 2531: 2489: 2456: 2436: 2409: 2373: 2353: 2324: 2302: 2195: 2175: 2155: 2131: 2104: 1784: 1764: 1741: 1721: 1701: 1681: 1654: 1627: 1575: 1504: 1452: 1409: 1389: 1365: 1322: 1221: 1110: 1090: 1070: 1050: 1030: 999: 979: 952: 932: 900: 802: 775: 755: 728: 704: 673: 585: 565: 541: 521: 501: 448: 424: 386: 312: 292: 264: 244: 220: 200: 4845: 4741: 2381:estimated from the training set. In other words, 4306:{\displaystyle q\in \{{\hat {y}},1-{\hat {y}}\}} 3383:are identical up to an additive constant (since 3609:{\displaystyle D_{\mathrm {KL} }(p\parallel q)} 3486:Principle of Minimum Discrimination Information 502:{\displaystyle D_{\mathrm {KL} }(p\parallel q)} 387:{\displaystyle H(p,q)=-\operatorname {E} _{p},} 3733:is the true label, and the given distribution 3325:amounts to minimization of the cross-entropy. 7634:Machine Learning: A Probabilistic Perspective 7416:is the true probability to be estimated, and 3651:. In fact, cross-entropy is another name for 158: 8: 4536:test accuracy. For example, suppose we have 4300: 4264: 4238: 4220: 1622: 1590: 1499: 1467: 459:The definition may be formulated using the 3407:, both take on their minimal values when 2756:, then the frequency of that value equals 1373:is also used for a different concept, the 436:operator with respect to the distribution 165: 151: 29: 7696: 7616: 7467: 7441: 7421: 7401: 7377: 7371: 7350: 7344: 7320: 7314: 7293: 7287: 7263: 7250: 7228: 7214: 7202: 7177: 7171: 7151: 7114: 7101: 7090: 7089: 7076: 7066: 7055: 7039: 7028: 7027: 7017: 7001: 6991: 6980: 6962: 6947: 6934: 6932: 6904: 6899: 6881: 6871: 6866: 6849: 6839: 6834: 6821: 6811: 6792: 6776: 6766: 6758: 6742: 6716: 6703: 6701: 6673: 6668: 6650: 6640: 6635: 6621: 6616: 6603: 6596: 6582: 6566: 6556: 6548: 6532: 6517: 6504: 6502: 6472: 6459: 6448: 6447: 6437: 6426: 6408: 6397: 6396: 6386: 6371: 6360: 6327: 6314: 6306: 6290: 6281: 6254: 6241: 6233: 6213: 6200: 6192: 6179: 6172: 6161: 6150: 6128: 6113: 6100: 6096: 6094: 6066: 6053: 6045: 6024: 6005: 5992: 5984: 5968: 5942: 5929: 5927: 5899: 5886: 5878: 5859: 5846: 5838: 5832: 5818: 5805: 5797: 5781: 5766: 5753: 5751: 5728: 5717: 5716: 5713: 5673: 5672: 5663: 5648: 5634: 5625: 5623: 5587: 5576: 5575: 5550: 5528: 5517: 5516: 5503: 5488: 5477: 5459: 5451: 5419: 5409: 5387: 5377: 5364: 5336: 5321: 5299: 5281: 5280: 5265: 5259: 5258: 5256: 5215: 5211: 5210: 5189: 5169: 5129: 5112: 5090: 5073: 5056: 5046: 5045: 5039: 4992: 4966: 4961: 4952: 4948: 4930: 4915: 4910: 4901: 4886: 4875: 4874: 4871: 4850: 4844: 4843: 4834: 4823: 4822: 4797: 4775: 4764: 4763: 4750: 4740: 4739: 4730: 4719: 4705: 4684: 4671: 4655: 4644: 4630: 4613: 4605: 4561: 4541: 4493: 4492: 4448: 4447: 4420: 4404: 4394: 4358: 4338: 4318: 4289: 4288: 4268: 4267: 4256: 4212: 4181: 4180: 4159: 4153: 4127: 4103: 4101: 4073: 4065: 4061: 4045: 4034: 4026: 4006: 4005: 3990: 3984: 3958: 3938: 3918: 3891: 3873: 3853: 3826: 3806: 3786: 3744: 3738: 3717: 3711: 3660: 3621: 3581: 3580: 3574: 3550: 3530: 3510: 3460: 3458: 3438: 3412: 3388: 3364: 3344: 3295: 3258: 3257: 3242: 3241: 3233: 3207: 3181: 3175: 3174: 3161: 3143: 3135: 3125: 3106: 3094: 3089: 3073: 3054: 3044: 3028: 3027: 3012: 3011: 3009: 2979: 2959: 2949: 2930: 2918: 2913: 2907: 2877: 2860: 2854: 2853: 2841: 2817: 2799: 2776: 2770: 2761: 2740: 2731: 2711: 2690: 2684: 2660: 2641: 2635: 2615: 2590: 2584: 2564: 2544: 2523: 2517: 2473: 2449: 2428: 2422: 2398: 2386: 2366: 2337: 2317: 2291: 2272: 2258: 2252: 2241: 2211: 2188: 2168: 2148: 2124: 2057: 2052: 2034: 2015: 1996: 1991: 1982: 1964: 1959: 1930: 1921: 1903: 1855: 1846: 1833: 1805: 1799: 1777: 1754: 1734: 1714: 1694: 1673: 1667: 1646: 1640: 1616: 1597: 1588: 1565: 1560: 1546: 1529: 1517: 1493: 1474: 1465: 1444: 1438: 1402: 1382: 1343: 1309: 1308: 1289: 1270: 1269: 1239: 1189: 1174: 1173: 1154: 1135: 1134: 1125: 1103: 1083: 1063: 1043: 1023: 992: 972: 945: 925: 876: 856: 855: 848: 818: 794: 793: 791: 768: 748: 721: 688: 643: 642: 600: 578: 558: 534: 514: 474: 473: 467: 441: 407: 401: 357: 327: 305: 285: 257: 237: 213: 193: 4523: 320:over a given set is defined as follows: 7534: 6963: 6129: 5649: 5635: 5460: 4598:of the loss function is then given by: 3845:, the probability is modeled using the 3359:against a fixed reference distribution 1628:{\displaystyle \{x_{1},\ldots ,x_{n}\}} 1505:{\displaystyle \{x_{1},\ldots ,x_{n}\}} 32: 7717:A tutorial on the cross-entropy method 5047: 280:The cross-entropy of the distribution 4198:{\displaystyle q_{y=0}=1-{\hat {y}}.} 3933:is some function of the input vector 3499:However, as discussed in the article 7: 7561:https://arxiv.org/pdf/2304.07288.pdf 4556:samples with each sample indexed by 2672:{\displaystyle q_{\theta }(X=x_{i})} 1058:be probability density functions of 252:, rather than the true distribution 3002:calculation rules for the logarithm 2606:from a training set, obtained from 97:Limiting density of discrete points 6940: 6936: 6709: 6705: 6510: 6506: 6106: 6102: 5935: 5931: 5759: 5755: 5631: 5627: 3585: 3582: 3490:Principle of Minimum Cross-Entropy 3461: 3176: 3136: 2855: 2763: 2733: 1900: 1830: 1802: 1310: 1186: 1175: 647: 644: 478: 475: 354: 25: 7366:is the output probability of the 5708:The proof is as follows. For any 5016:the logistic function as before. 3906:{\displaystyle g(z)=1/(1+e^{-z})} 3319:monotonically increasing function 2332:is the size of the test set, and 108:Asymptotic equipartition property 4962: 4953: 4911: 4902: 4614: 4104: 4074: 4066: 4035: 4027: 3259: 3029: 40: 3545:is optimized to be as close to 3477:{\displaystyle \mathrm {H} (p)} 2444:. The sum is averaged over the 964:with respect to some reference 124:Shannon's source coding theorem 7269: 7243: 7208: 7189: 7120: 7095: 7085: 7045: 7033: 7010: 6967: 6959: 6478: 6453: 6443: 6402: 6287: 6268: 6133: 6125: 5737:{\displaystyle {\hat {y}}_{i}} 5722: 5690: 5678: 5669: 5653: 5645: 5593: 5581: 5565: 5556: 5537: 5522: 5464: 5456: 5428: 5354: 5330: 5292: 5286: 5271: 5234: 5222: 5003: 4997: 4974: 4935: 4921: 4898: 4880: 4840: 4828: 4812: 4803: 4784: 4769: 4690: 4664: 4618: 4610: 4504: 4498: 4483: 4474: 4462: 4453: 4375: 4363: 4294: 4273: 4244:{\displaystyle p\in \{y,1-y\}} 4186: 4039: 4023: 4011: 3900: 3878: 3864: 3858: 3777:More specifically, consider a 3677: 3665: 3638: 3626: 3603: 3591: 3471: 3465: 3301: 3282: 3264: 3248: 3213: 3194: 3132: 3112: 3079: 3060: 3034: 3018: 2985: 2966: 2956: 2936: 2883: 2864: 2823: 2804: 2666: 2647: 2501:Relation to maximum likelihood 2484: 2478: 2404: 2391: 2348: 2342: 2297: 2284: 2228: 2216: 2096: 2084: 2075: 2069: 2049: 2043: 2021: 2008: 1988: 1975: 1940: 1934: 1883: 1877: 1865: 1859: 1820: 1814: 1662:is the length of the code for 1535: 1522: 1460:out of a set of possibilities 1360: 1348: 1305: 1299: 1286: 1280: 1256: 1244: 1213: 1198: 1170: 1164: 1151: 1145: 892: 886: 873: 867: 835: 823: 803:{\displaystyle {\mathcal {X}}} 699: 693: 665: 653: 632: 626: 617: 605: 496: 484: 419: 413: 378: 366: 344: 332: 82:Conditional mutual information 1: 7721:Annals of Operations Research 7518:Maximum-likelihood estimation 3505:, sometimes the distribution 2902:, which can be seen to equal 2464:words of the test. This is a 27:Information-theoretic measure 7668:10.1007/978-3-030-20951-3_18 7309:is the cost function of the 4587:{\displaystyle n=1,\dots ,N} 4207:Having set up our notation, 4111:{\displaystyle \mathbf {w} } 4096:where the vector of weights 3403:is fixed): According to the 2361:is the probability of event 1431:Kraft–McMillan theorem 134:Noisy-channel coding theorem 3502:Kullbackā€“Leibler divergence 2532:{\displaystyle q_{\theta }} 461:Kullbackā€“Leibler divergence 300:relative to a distribution 7758: 7481:{\displaystyle \lambda =1} 7455:{\displaystyle \lambda =0} 3332: 3329:Cross-entropy minimization 2829:{\displaystyle p(X=x_{i})} 2139:is unknown. An example is 743:probability distributions 7513:Kullbackā€“Leibler distance 3317:Since the logarithm is a 2787:{\displaystyle \#x_{i}/N} 2608:conditionally independent 1655:{\displaystyle \ell _{i}} 187:probability distributions 7599:sklearn.metrics.log_loss 7429:{\displaystyle \lambda } 5616:Then we have the result 2410:{\displaystyle q(x_{i})} 7737:Entropy and information 3488:") is often called the 3453:for KL divergence, and 3323:likelihood maximization 2794:. Denote the latter by 2749:{\displaystyle \#x_{i}} 2623:{\displaystyle \theta } 2552:{\displaystyle \theta } 139:Shannonā€“Hartley theorem 7632:Murphy, Kevin (2012). 7482: 7456: 7430: 7410: 7390: 7389:{\displaystyle k^{th}} 7360: 7333: 7332:{\displaystyle k^{th}} 7303: 7276: 7160: 7130: 7071: 6996: 6922: 6691: 6492: 6442: 6376: 6166: 6084: 5917: 5738: 5700: 5608: 5493: 5441: 5246: 5010: 4981: 4858: 4735: 4660: 4588: 4550: 4530: 4514: 4347: 4327: 4307: 4245: 4199: 4142: 4112: 4090: 3973: 3947: 3927: 3907: 3835: 3815: 3795: 3754: 3727: 3684: 3683:{\displaystyle H(p,q)} 3645: 3644:{\displaystyle H(p,q)} 3610: 3559: 3539: 3519: 3478: 3447: 3427: 3397: 3373: 3353: 3311: 3222: 2994: 2892: 2830: 2788: 2750: 2720: 2700: 2673: 2624: 2600: 2573: 2553: 2533: 2491: 2458: 2438: 2411: 2375: 2355: 2326: 2304: 2257: 2197: 2177: 2157: 2133: 2106: 1786: 1766: 1743: 1723: 1703: 1683: 1656: 1629: 1577: 1506: 1454: 1411: 1391: 1367: 1366:{\displaystyle H(p,q)} 1324: 1223: 1112: 1092: 1072: 1052: 1032: 1001: 981: 954: 934: 902: 804: 777: 757: 730: 706: 675: 587: 567: 543: 523: 503: 450: 426: 388: 314: 294: 266: 246: 222: 202: 113:Rateā€“distortion theory 7483: 7457: 7431: 7411: 7391: 7361: 7359:{\displaystyle q^{k}} 7334: 7304: 7302:{\displaystyle e^{k}} 7277: 7161: 7142:Amended cross-entropy 7131: 7051: 6976: 6923: 6692: 6493: 6422: 6356: 6146: 6085: 5918: 5739: 5701: 5609: 5473: 5442: 5247: 5011: 4982: 4859: 4715: 4640: 4589: 4551: 4527: 4515: 4348: 4328: 4308: 4246: 4200: 4143: 4113: 4091: 3974: 3948: 3928: 3908: 3836: 3816: 3796: 3755: 3753:{\displaystyle q_{i}} 3728: 3726:{\displaystyle p_{i}} 3685: 3646: 3611: 3560: 3540: 3520: 3479: 3448: 3428: 3398: 3374: 3354: 3312: 3223: 2995: 2893: 2831: 2789: 2751: 2721: 2701: 2699:{\displaystyle x_{i}} 2674: 2625: 2601: 2599:{\displaystyle x_{i}} 2574: 2554: 2534: 2492: 2459: 2439: 2437:{\displaystyle x_{i}} 2412: 2376: 2356: 2327: 2305: 2237: 2198: 2178: 2158: 2134: 2107: 1787: 1767: 1744: 1724: 1704: 1684: 1682:{\displaystyle x_{i}} 1657: 1630: 1578: 1507: 1455: 1453:{\displaystyle x_{i}} 1412: 1392: 1368: 1325: 1224: 1113: 1093: 1073: 1053: 1033: 1002: 982: 962:absolutely continuous 955: 935: 903: 805: 778: 758: 731: 707: 676: 588: 568: 544: 524: 504: 451: 427: 425:{\displaystyle E_{p}} 389: 315: 295: 267: 247: 223: 203: 7498:Cross-entropy method 7466: 7440: 7420: 7400: 7370: 7343: 7313: 7286: 7170: 7150: 6931: 6700: 6501: 6093: 5926: 5750: 5712: 5622: 5450: 5255: 5038: 5009:{\displaystyle g(z)} 4991: 4870: 4604: 4560: 4540: 4357: 4337: 4317: 4255: 4211: 4152: 4126: 4100: 3983: 3957: 3937: 3917: 3852: 3825: 3805: 3785: 3737: 3710: 3659: 3620: 3573: 3549: 3529: 3509: 3457: 3437: 3411: 3387: 3379:, cross-entropy and 3363: 3343: 3335:Cross-entropy method 3232: 3008: 2906: 2840: 2798: 2760: 2730: 2710: 2683: 2634: 2614: 2583: 2563: 2543: 2516: 2490:{\displaystyle p(x)} 2472: 2466:Monte Carlo estimate 2448: 2421: 2385: 2365: 2354:{\displaystyle q(x)} 2336: 2316: 2210: 2187: 2167: 2147: 2123: 1798: 1776: 1753: 1733: 1713: 1693: 1666: 1639: 1587: 1516: 1464: 1437: 1401: 1381: 1342: 1238: 1124: 1102: 1082: 1062: 1042: 1022: 991: 971: 944: 924: 817: 790: 767: 747: 720: 705:{\displaystyle H(p)} 687: 599: 577: 557: 533: 513: 466: 440: 400: 326: 304: 284: 256: 236: 212: 192: 77:Directed information 57:Differential entropy 7508:Conditional entropy 7503:Logistic regression 4148:is simply given by 4141:{\displaystyle y=0} 3972:{\displaystyle y=1} 3843:logistic regression 3426:{\displaystyle p=q} 1330:   ( 908:   ( 549:(also known as the 62:Conditional entropy 7523:Mutual information 7478: 7452: 7426: 7406: 7386: 7356: 7329: 7299: 7272: 7239: 7156: 7126: 6918: 6687: 6488: 6486: 6080: 5913: 5734: 5696: 5604: 5437: 5242: 5200: 5032:. That is, define 5006: 4977: 4854: 4584: 4546: 4531: 4510: 4399: 4343: 4323: 4303: 4241: 4195: 4138: 4108: 4086: 3969: 3943: 3923: 3903: 3831: 3811: 3791: 3750: 3723: 3680: 3641: 3606: 3555: 3535: 3515: 3474: 3443: 3423: 3393: 3369: 3349: 3307: 3218: 3101: 3049: 2990: 2925: 2888: 2826: 2784: 2746: 2716: 2696: 2669: 2620: 2596: 2569: 2549: 2529: 2487: 2454: 2434: 2407: 2371: 2351: 2322: 2300: 2193: 2173: 2153: 2129: 2102: 2039: 1971: 1782: 1765:{\displaystyle q.} 1762: 1739: 1719: 1699: 1679: 1652: 1625: 1573: 1502: 1450: 1427:information theory 1407: 1387: 1363: 1320: 1219: 1108: 1088: 1068: 1048: 1028: 997: 977: 950: 930: 916:The situation for 898: 863: 800: 773: 753: 726: 702: 671: 583: 563: 539: 519: 499: 446: 422: 384: 310: 290: 262: 242: 218: 198: 179:information theory 72:Mutual information 34:Information theory 18:Cross entropy loss 7677:978-3-030-20950-6 7409:{\displaystyle p} 7224: 7222: 7159:{\displaystyle K} 7098: 7036: 6954: 6913: 6801: 6723: 6682: 6591: 6524: 6456: 6405: 6336: 6263: 6120: 6075: 6014: 5949: 5908: 5827: 5773: 5725: 5681: 5640: 5584: 5525: 5432: 5289: 5274: 5030:linear regression 4883: 4831: 4772: 4738: 4713: 4701: 4695: 4638: 4629: 4623: 4549:{\displaystyle N} 4501: 4456: 4434: 4428: 4390: 4386: 4380: 4346:{\displaystyle q} 4326:{\displaystyle p} 4297: 4276: 4189: 4081: 4014: 3946:{\displaystyle x} 3926:{\displaystyle z} 3847:logistic function 3834:{\displaystyle x} 3814:{\displaystyle 1} 3794:{\displaystyle 0} 3779:binary regression 3558:{\displaystyle q} 3538:{\displaystyle p} 3518:{\displaystyle q} 3446:{\displaystyle 0} 3405:Gibbs' inequality 3396:{\displaystyle p} 3372:{\displaystyle p} 3352:{\displaystyle q} 3085: 3040: 2909: 2719:{\displaystyle i} 2572:{\displaystyle N} 2457:{\displaystyle N} 2374:{\displaystyle x} 2325:{\displaystyle N} 2266: 2196:{\displaystyle q} 2176:{\displaystyle p} 2156:{\displaystyle T} 2141:language modeling 2132:{\displaystyle p} 2030: 1955: 1887: 1785:{\displaystyle p} 1742:{\displaystyle p} 1722:{\displaystyle p} 1702:{\displaystyle q} 1554: 1410:{\displaystyle q} 1390:{\displaystyle p} 1338:NB: The notation 1111:{\displaystyle r} 1091:{\displaystyle q} 1071:{\displaystyle p} 1051:{\displaystyle Q} 1031:{\displaystyle P} 1000:{\displaystyle r} 980:{\displaystyle r} 953:{\displaystyle q} 933:{\displaystyle p} 844: 776:{\displaystyle q} 756:{\displaystyle p} 729:{\displaystyle p} 586:{\displaystyle q} 566:{\displaystyle p} 542:{\displaystyle q} 522:{\displaystyle p} 449:{\displaystyle p} 313:{\displaystyle p} 293:{\displaystyle q} 265:{\displaystyle p} 245:{\displaystyle q} 221:{\displaystyle q} 201:{\displaystyle p} 175: 174: 16:(Redirected from 7749: 7703: 7702: 7700: 7688: 7682: 7681: 7654: 7648: 7647: 7629: 7623: 7622: 7620: 7607: 7601: 7596: 7590: 7584: 7578: 7569: 7563: 7557: 7551: 7548: 7542: 7539: 7487: 7485: 7484: 7479: 7461: 7459: 7458: 7453: 7435: 7433: 7432: 7427: 7415: 7413: 7412: 7407: 7395: 7393: 7392: 7387: 7385: 7384: 7365: 7363: 7362: 7357: 7355: 7354: 7338: 7336: 7335: 7330: 7328: 7327: 7308: 7306: 7305: 7300: 7298: 7297: 7281: 7279: 7278: 7273: 7268: 7267: 7255: 7254: 7238: 7223: 7215: 7207: 7206: 7182: 7181: 7165: 7163: 7162: 7157: 7135: 7133: 7132: 7127: 7119: 7118: 7106: 7105: 7100: 7099: 7091: 7084: 7083: 7070: 7065: 7044: 7043: 7038: 7037: 7029: 7022: 7021: 7009: 7008: 6995: 6990: 6966: 6955: 6953: 6952: 6951: 6935: 6927: 6925: 6924: 6919: 6914: 6912: 6911: 6910: 6909: 6908: 6891: 6890: 6889: 6888: 6876: 6875: 6860: 6859: 6858: 6857: 6856: 6844: 6843: 6829: 6828: 6812: 6807: 6803: 6802: 6800: 6799: 6798: 6797: 6796: 6784: 6783: 6771: 6770: 6743: 6724: 6722: 6721: 6720: 6704: 6696: 6694: 6693: 6688: 6683: 6681: 6680: 6679: 6678: 6677: 6660: 6659: 6658: 6657: 6645: 6644: 6629: 6628: 6627: 6626: 6625: 6611: 6610: 6597: 6592: 6590: 6589: 6588: 6587: 6586: 6574: 6573: 6561: 6560: 6533: 6525: 6523: 6522: 6521: 6505: 6497: 6495: 6494: 6489: 6487: 6477: 6476: 6464: 6463: 6458: 6457: 6449: 6441: 6436: 6418: 6414: 6413: 6412: 6407: 6406: 6398: 6391: 6390: 6375: 6370: 6346: 6342: 6338: 6337: 6335: 6334: 6333: 6332: 6331: 6319: 6318: 6291: 6286: 6285: 6264: 6262: 6261: 6260: 6259: 6258: 6246: 6245: 6221: 6220: 6219: 6218: 6217: 6205: 6204: 6184: 6183: 6173: 6165: 6160: 6132: 6121: 6119: 6118: 6117: 6101: 6089: 6087: 6086: 6081: 6076: 6074: 6073: 6072: 6071: 6070: 6058: 6057: 6033: 6025: 6020: 6016: 6015: 6013: 6012: 6011: 6010: 6009: 5997: 5996: 5969: 5950: 5948: 5947: 5946: 5930: 5922: 5920: 5919: 5914: 5909: 5907: 5906: 5905: 5904: 5903: 5891: 5890: 5866: 5865: 5864: 5863: 5851: 5850: 5833: 5828: 5826: 5825: 5824: 5823: 5822: 5810: 5809: 5782: 5774: 5772: 5771: 5770: 5754: 5743: 5741: 5740: 5735: 5733: 5732: 5727: 5726: 5718: 5705: 5703: 5702: 5697: 5683: 5682: 5674: 5668: 5667: 5652: 5641: 5639: 5638: 5626: 5613: 5611: 5610: 5605: 5600: 5596: 5592: 5591: 5586: 5585: 5577: 5555: 5554: 5533: 5532: 5527: 5526: 5518: 5508: 5507: 5492: 5487: 5463: 5446: 5444: 5443: 5438: 5433: 5431: 5427: 5426: 5414: 5413: 5395: 5394: 5382: 5381: 5369: 5368: 5337: 5329: 5328: 5307: 5306: 5291: 5290: 5282: 5276: 5275: 5270: 5269: 5260: 5251: 5249: 5248: 5243: 5238: 5237: 5214: 5205: 5204: 5197: 5196: 5177: 5176: 5151: 5137: 5136: 5117: 5116: 5098: 5097: 5078: 5077: 5052: 5051: 5050: 5021: 5015: 5013: 5012: 5007: 4986: 4984: 4983: 4978: 4973: 4972: 4971: 4970: 4965: 4956: 4934: 4920: 4919: 4914: 4905: 4891: 4890: 4885: 4884: 4876: 4863: 4861: 4860: 4855: 4849: 4848: 4839: 4838: 4833: 4832: 4824: 4802: 4801: 4780: 4779: 4774: 4773: 4765: 4755: 4754: 4745: 4744: 4736: 4734: 4729: 4714: 4706: 4699: 4693: 4689: 4688: 4676: 4675: 4659: 4654: 4639: 4631: 4627: 4621: 4617: 4593: 4591: 4590: 4585: 4555: 4553: 4552: 4547: 4519: 4517: 4516: 4511: 4503: 4502: 4494: 4458: 4457: 4449: 4432: 4426: 4425: 4424: 4409: 4408: 4398: 4384: 4378: 4352: 4350: 4349: 4344: 4332: 4330: 4329: 4324: 4312: 4310: 4309: 4304: 4299: 4298: 4290: 4278: 4277: 4269: 4250: 4248: 4247: 4242: 4204: 4202: 4201: 4196: 4191: 4190: 4182: 4170: 4169: 4147: 4145: 4144: 4139: 4120:gradient descent 4117: 4115: 4114: 4109: 4107: 4095: 4093: 4092: 4087: 4082: 4080: 4079: 4078: 4077: 4069: 4046: 4038: 4030: 4016: 4015: 4007: 4001: 4000: 3978: 3976: 3975: 3970: 3952: 3950: 3949: 3944: 3932: 3930: 3929: 3924: 3912: 3910: 3909: 3904: 3899: 3898: 3877: 3840: 3838: 3837: 3832: 3820: 3818: 3817: 3812: 3800: 3798: 3797: 3792: 3766:logarithmic loss 3759: 3757: 3756: 3751: 3749: 3748: 3732: 3730: 3729: 3724: 3722: 3721: 3700:machine learning 3689: 3687: 3686: 3681: 3653:relative entropy 3650: 3648: 3647: 3642: 3615: 3613: 3612: 3607: 3590: 3589: 3588: 3564: 3562: 3561: 3556: 3544: 3542: 3541: 3536: 3524: 3522: 3521: 3516: 3483: 3481: 3480: 3475: 3464: 3452: 3450: 3449: 3444: 3432: 3430: 3429: 3424: 3402: 3400: 3399: 3394: 3378: 3376: 3375: 3370: 3358: 3356: 3355: 3350: 3316: 3314: 3313: 3308: 3300: 3299: 3263: 3262: 3247: 3246: 3227: 3225: 3224: 3219: 3217: 3216: 3212: 3211: 3180: 3179: 3169: 3168: 3150: 3149: 3148: 3147: 3130: 3129: 3111: 3110: 3100: 3099: 3098: 3078: 3077: 3059: 3058: 3048: 3033: 3032: 3017: 3016: 2999: 2997: 2996: 2991: 2989: 2988: 2984: 2983: 2954: 2953: 2935: 2934: 2924: 2923: 2922: 2897: 2895: 2894: 2889: 2887: 2886: 2882: 2881: 2859: 2858: 2835: 2833: 2832: 2827: 2822: 2821: 2793: 2791: 2790: 2785: 2780: 2775: 2774: 2755: 2753: 2752: 2747: 2745: 2744: 2726:) is denoted by 2725: 2723: 2722: 2717: 2706:(for some index 2705: 2703: 2702: 2697: 2695: 2694: 2678: 2676: 2675: 2670: 2665: 2664: 2646: 2645: 2629: 2627: 2626: 2621: 2605: 2603: 2602: 2597: 2595: 2594: 2578: 2576: 2575: 2570: 2558: 2556: 2555: 2550: 2538: 2536: 2535: 2530: 2528: 2527: 2496: 2494: 2493: 2488: 2463: 2461: 2460: 2455: 2443: 2441: 2440: 2435: 2433: 2432: 2416: 2414: 2413: 2408: 2403: 2402: 2380: 2378: 2377: 2372: 2360: 2358: 2357: 2352: 2331: 2329: 2328: 2323: 2309: 2307: 2306: 2301: 2296: 2295: 2277: 2276: 2267: 2259: 2256: 2251: 2202: 2200: 2199: 2194: 2182: 2180: 2179: 2174: 2162: 2160: 2159: 2154: 2138: 2136: 2135: 2130: 2111: 2109: 2108: 2103: 2062: 2061: 2038: 2020: 2019: 2001: 2000: 1987: 1986: 1970: 1969: 1968: 1948: 1944: 1943: 1926: 1925: 1908: 1907: 1892: 1888: 1886: 1869: 1868: 1847: 1838: 1837: 1810: 1809: 1791: 1789: 1788: 1783: 1771: 1769: 1768: 1763: 1748: 1746: 1745: 1740: 1728: 1726: 1725: 1720: 1708: 1706: 1705: 1700: 1688: 1686: 1685: 1680: 1678: 1677: 1661: 1659: 1658: 1653: 1651: 1650: 1634: 1632: 1631: 1626: 1621: 1620: 1602: 1601: 1582: 1580: 1579: 1574: 1572: 1571: 1570: 1569: 1559: 1555: 1547: 1534: 1533: 1511: 1509: 1508: 1503: 1498: 1497: 1479: 1478: 1459: 1457: 1456: 1451: 1449: 1448: 1416: 1414: 1413: 1408: 1396: 1394: 1393: 1388: 1372: 1370: 1369: 1364: 1333: 1329: 1327: 1326: 1321: 1313: 1276: 1275: 1274: 1228: 1226: 1225: 1220: 1194: 1193: 1178: 1141: 1140: 1139: 1117: 1115: 1114: 1109: 1098:with respect to 1097: 1095: 1094: 1089: 1077: 1075: 1074: 1069: 1057: 1055: 1054: 1049: 1037: 1035: 1034: 1029: 1009:Lebesgue measure 1006: 1004: 1003: 998: 986: 984: 983: 978: 959: 957: 956: 951: 939: 937: 936: 931: 911: 907: 905: 904: 899: 862: 861: 860: 809: 807: 806: 801: 799: 798: 782: 780: 779: 774: 762: 760: 759: 754: 735: 733: 732: 727: 711: 709: 708: 703: 680: 678: 677: 672: 652: 651: 650: 592: 590: 589: 584: 573:with respect to 572: 570: 569: 564: 551:relative entropy 548: 546: 545: 540: 528: 526: 525: 520: 509:, divergence of 508: 506: 505: 500: 483: 482: 481: 455: 453: 452: 447: 431: 429: 428: 423: 412: 411: 393: 391: 390: 385: 362: 361: 319: 317: 316: 311: 299: 297: 296: 291: 271: 269: 268: 263: 251: 249: 248: 243: 227: 225: 224: 219: 207: 205: 204: 199: 167: 160: 153: 129:Channel capacity 87:Relative entropy 44: 30: 21: 7757: 7756: 7752: 7751: 7750: 7748: 7747: 7746: 7727: 7726: 7723:134 (1), 19ā€“67. 7712: 7710:Further reading 7707: 7706: 7690: 7689: 7685: 7678: 7656: 7655: 7651: 7644: 7631: 7630: 7626: 7609: 7608: 7604: 7597: 7593: 7585: 7581: 7570: 7566: 7558: 7554: 7549: 7545: 7540: 7536: 7531: 7494: 7464: 7463: 7438: 7437: 7418: 7417: 7398: 7397: 7373: 7368: 7367: 7346: 7341: 7340: 7316: 7311: 7310: 7289: 7284: 7283: 7259: 7246: 7198: 7173: 7168: 7167: 7148: 7147: 7144: 7110: 7088: 7072: 7026: 7013: 6997: 6943: 6939: 6929: 6928: 6900: 6895: 6877: 6867: 6862: 6861: 6845: 6835: 6830: 6817: 6813: 6788: 6772: 6762: 6754: 6747: 6735: 6731: 6712: 6708: 6698: 6697: 6669: 6664: 6646: 6636: 6631: 6630: 6617: 6612: 6599: 6598: 6578: 6562: 6552: 6544: 6537: 6513: 6509: 6499: 6498: 6485: 6484: 6468: 6446: 6395: 6382: 6381: 6377: 6344: 6343: 6323: 6310: 6302: 6295: 6277: 6250: 6237: 6229: 6222: 6209: 6196: 6188: 6175: 6174: 6171: 6167: 6136: 6109: 6105: 6091: 6090: 6062: 6049: 6041: 6034: 6026: 6001: 5988: 5980: 5973: 5961: 5957: 5938: 5934: 5924: 5923: 5895: 5882: 5874: 5867: 5855: 5842: 5834: 5814: 5801: 5793: 5786: 5762: 5758: 5748: 5747: 5715: 5710: 5709: 5659: 5630: 5620: 5619: 5574: 5546: 5515: 5499: 5498: 5494: 5448: 5447: 5415: 5405: 5383: 5373: 5360: 5341: 5317: 5295: 5261: 5253: 5252: 5209: 5199: 5198: 5185: 5183: 5178: 5165: 5163: 5157: 5156: 5150: 5145: 5139: 5138: 5125: 5123: 5118: 5108: 5106: 5100: 5099: 5086: 5084: 5079: 5069: 5067: 5057: 5041: 5036: 5035: 5019: 4989: 4988: 4960: 4944: 4909: 4873: 4868: 4867: 4821: 4793: 4762: 4746: 4680: 4667: 4602: 4601: 4558: 4557: 4538: 4537: 4416: 4400: 4355: 4354: 4335: 4334: 4315: 4314: 4253: 4252: 4209: 4208: 4155: 4150: 4149: 4124: 4123: 4098: 4097: 4057: 4050: 3986: 3981: 3980: 3955: 3954: 3935: 3934: 3915: 3914: 3887: 3850: 3849: 3823: 3822: 3803: 3802: 3783: 3782: 3740: 3735: 3734: 3713: 3708: 3707: 3696: 3657: 3656: 3618: 3617: 3576: 3571: 3570: 3547: 3546: 3527: 3526: 3507: 3506: 3455: 3454: 3435: 3434: 3409: 3408: 3385: 3384: 3361: 3360: 3341: 3340: 3337: 3331: 3291: 3230: 3229: 3203: 3173: 3157: 3139: 3131: 3121: 3102: 3090: 3069: 3050: 3006: 3005: 2975: 2955: 2945: 2926: 2914: 2904: 2903: 2873: 2852: 2838: 2837: 2813: 2796: 2795: 2766: 2758: 2757: 2736: 2728: 2727: 2708: 2707: 2686: 2681: 2680: 2656: 2637: 2632: 2631: 2612: 2611: 2586: 2581: 2580: 2561: 2560: 2541: 2540: 2519: 2514: 2513: 2503: 2470: 2469: 2446: 2445: 2424: 2419: 2418: 2394: 2383: 2382: 2363: 2362: 2334: 2333: 2314: 2313: 2287: 2268: 2208: 2207: 2185: 2184: 2165: 2164: 2145: 2144: 2121: 2120: 2117: 2053: 2011: 1992: 1978: 1960: 1917: 1916: 1912: 1899: 1870: 1848: 1842: 1829: 1801: 1796: 1795: 1774: 1773: 1751: 1750: 1731: 1730: 1711: 1710: 1691: 1690: 1669: 1664: 1663: 1642: 1637: 1636: 1612: 1593: 1585: 1584: 1561: 1542: 1541: 1525: 1514: 1513: 1489: 1470: 1462: 1461: 1440: 1435: 1434: 1423: 1399: 1398: 1379: 1378: 1340: 1339: 1336: 1331: 1265: 1236: 1235: 1185: 1130: 1122: 1121: 1100: 1099: 1080: 1079: 1060: 1059: 1040: 1039: 1020: 1019: 989: 988: 969: 968: 942: 941: 922: 921: 914: 909: 815: 814: 788: 787: 765: 764: 745: 744: 718: 717: 685: 684: 638: 597: 596: 575: 574: 555: 554: 531: 530: 511: 510: 469: 464: 463: 438: 437: 403: 398: 397: 353: 324: 323: 302: 301: 282: 281: 278: 254: 253: 234: 233: 210: 209: 190: 189: 171: 28: 23: 22: 15: 12: 11: 5: 7755: 7753: 7745: 7744: 7742:Loss functions 7739: 7729: 7728: 7725: 7724: 7711: 7708: 7705: 7704: 7683: 7676: 7649: 7643:978-0262018029 7642: 7624: 7602: 7591: 7579: 7564: 7552: 7543: 7533: 7532: 7530: 7527: 7526: 7525: 7520: 7515: 7510: 7505: 7500: 7493: 7490: 7477: 7474: 7471: 7451: 7448: 7445: 7425: 7405: 7383: 7380: 7376: 7353: 7349: 7326: 7323: 7319: 7296: 7292: 7271: 7266: 7262: 7258: 7253: 7249: 7245: 7242: 7237: 7234: 7231: 7227: 7221: 7218: 7213: 7210: 7205: 7201: 7197: 7194: 7191: 7188: 7185: 7180: 7176: 7155: 7143: 7140: 7125: 7122: 7117: 7113: 7109: 7104: 7097: 7094: 7087: 7082: 7079: 7075: 7069: 7064: 7061: 7058: 7054: 7050: 7047: 7042: 7035: 7032: 7025: 7020: 7016: 7012: 7007: 7004: 7000: 6994: 6989: 6986: 6983: 6979: 6975: 6972: 6969: 6965: 6961: 6958: 6950: 6946: 6942: 6938: 6917: 6907: 6903: 6898: 6894: 6887: 6884: 6880: 6874: 6870: 6865: 6855: 6852: 6848: 6842: 6838: 6833: 6827: 6824: 6820: 6816: 6810: 6806: 6795: 6791: 6787: 6782: 6779: 6775: 6769: 6765: 6761: 6757: 6753: 6750: 6746: 6741: 6738: 6734: 6730: 6727: 6719: 6715: 6711: 6707: 6686: 6676: 6672: 6667: 6663: 6656: 6653: 6649: 6643: 6639: 6634: 6624: 6620: 6615: 6609: 6606: 6602: 6595: 6585: 6581: 6577: 6572: 6569: 6565: 6559: 6555: 6551: 6547: 6543: 6540: 6536: 6531: 6528: 6520: 6516: 6512: 6508: 6483: 6480: 6475: 6471: 6467: 6462: 6455: 6452: 6445: 6440: 6435: 6432: 6429: 6425: 6421: 6417: 6411: 6404: 6401: 6394: 6389: 6385: 6380: 6374: 6369: 6366: 6363: 6359: 6355: 6352: 6349: 6347: 6345: 6341: 6330: 6326: 6322: 6317: 6313: 6309: 6305: 6301: 6298: 6294: 6289: 6284: 6280: 6276: 6273: 6270: 6267: 6257: 6253: 6249: 6244: 6240: 6236: 6232: 6228: 6225: 6216: 6212: 6208: 6203: 6199: 6195: 6191: 6187: 6182: 6178: 6170: 6164: 6159: 6156: 6153: 6149: 6145: 6142: 6139: 6137: 6135: 6131: 6127: 6124: 6116: 6112: 6108: 6104: 6099: 6098: 6079: 6069: 6065: 6061: 6056: 6052: 6048: 6044: 6040: 6037: 6032: 6029: 6023: 6019: 6008: 6004: 6000: 5995: 5991: 5987: 5983: 5979: 5976: 5972: 5967: 5964: 5960: 5956: 5953: 5945: 5941: 5937: 5933: 5912: 5902: 5898: 5894: 5889: 5885: 5881: 5877: 5873: 5870: 5862: 5858: 5854: 5849: 5845: 5841: 5837: 5831: 5821: 5817: 5813: 5808: 5804: 5800: 5796: 5792: 5789: 5785: 5780: 5777: 5769: 5765: 5761: 5757: 5731: 5724: 5721: 5695: 5692: 5689: 5686: 5680: 5677: 5671: 5666: 5662: 5658: 5655: 5651: 5647: 5644: 5637: 5633: 5629: 5603: 5599: 5595: 5590: 5583: 5580: 5573: 5570: 5567: 5564: 5561: 5558: 5553: 5549: 5545: 5542: 5539: 5536: 5531: 5524: 5521: 5514: 5511: 5506: 5502: 5497: 5491: 5486: 5483: 5480: 5476: 5472: 5469: 5466: 5462: 5458: 5455: 5436: 5430: 5425: 5422: 5418: 5412: 5408: 5404: 5401: 5398: 5393: 5390: 5386: 5380: 5376: 5372: 5367: 5363: 5359: 5356: 5353: 5350: 5347: 5344: 5340: 5335: 5332: 5327: 5324: 5320: 5316: 5313: 5310: 5305: 5302: 5298: 5294: 5288: 5285: 5279: 5273: 5268: 5264: 5241: 5236: 5233: 5230: 5227: 5224: 5221: 5218: 5213: 5208: 5203: 5195: 5192: 5188: 5184: 5182: 5179: 5175: 5172: 5168: 5164: 5162: 5159: 5158: 5155: 5152: 5149: 5146: 5144: 5141: 5140: 5135: 5132: 5128: 5124: 5122: 5119: 5115: 5111: 5107: 5105: 5102: 5101: 5096: 5093: 5089: 5085: 5083: 5080: 5076: 5072: 5068: 5066: 5063: 5062: 5060: 5055: 5049: 5044: 5005: 5002: 4999: 4996: 4976: 4969: 4964: 4959: 4955: 4951: 4947: 4943: 4940: 4937: 4933: 4929: 4926: 4923: 4918: 4913: 4908: 4904: 4900: 4897: 4894: 4889: 4882: 4879: 4853: 4847: 4842: 4837: 4830: 4827: 4820: 4817: 4814: 4811: 4808: 4805: 4800: 4796: 4792: 4789: 4786: 4783: 4778: 4771: 4768: 4761: 4758: 4753: 4749: 4743: 4733: 4728: 4725: 4722: 4718: 4712: 4709: 4704: 4698: 4692: 4687: 4683: 4679: 4674: 4670: 4666: 4663: 4658: 4653: 4650: 4647: 4643: 4637: 4634: 4626: 4620: 4616: 4612: 4609: 4583: 4580: 4577: 4574: 4571: 4568: 4565: 4545: 4533: 4532: 4509: 4506: 4500: 4497: 4491: 4488: 4485: 4482: 4479: 4476: 4473: 4470: 4467: 4464: 4461: 4455: 4452: 4446: 4443: 4440: 4437: 4431: 4423: 4419: 4415: 4412: 4407: 4403: 4397: 4393: 4389: 4383: 4377: 4374: 4371: 4368: 4365: 4362: 4342: 4322: 4302: 4296: 4293: 4287: 4284: 4281: 4275: 4272: 4266: 4263: 4260: 4240: 4237: 4234: 4231: 4228: 4225: 4222: 4219: 4216: 4194: 4188: 4185: 4179: 4176: 4173: 4168: 4165: 4162: 4158: 4137: 4134: 4131: 4106: 4085: 4076: 4072: 4068: 4064: 4060: 4056: 4053: 4049: 4044: 4041: 4037: 4033: 4029: 4025: 4022: 4019: 4013: 4010: 4004: 3999: 3996: 3993: 3989: 3968: 3965: 3962: 3942: 3922: 3902: 3897: 3894: 3890: 3886: 3883: 3880: 3876: 3872: 3869: 3866: 3863: 3860: 3857: 3830: 3810: 3790: 3747: 3743: 3720: 3716: 3695: 3692: 3679: 3676: 3673: 3670: 3667: 3664: 3640: 3637: 3634: 3631: 3628: 3625: 3616:, rather than 3605: 3602: 3599: 3596: 3593: 3587: 3584: 3579: 3554: 3534: 3514: 3473: 3470: 3467: 3463: 3442: 3422: 3419: 3416: 3392: 3368: 3348: 3333:Main article: 3330: 3327: 3306: 3303: 3298: 3294: 3290: 3287: 3284: 3281: 3278: 3275: 3272: 3269: 3266: 3261: 3256: 3253: 3250: 3245: 3240: 3237: 3215: 3210: 3206: 3202: 3199: 3196: 3193: 3190: 3187: 3184: 3178: 3172: 3167: 3164: 3160: 3156: 3153: 3146: 3142: 3138: 3134: 3128: 3124: 3120: 3117: 3114: 3109: 3105: 3097: 3093: 3088: 3084: 3081: 3076: 3072: 3068: 3065: 3062: 3057: 3053: 3047: 3043: 3039: 3036: 3031: 3026: 3023: 3020: 3015: 2987: 2982: 2978: 2974: 2971: 2968: 2965: 2962: 2958: 2952: 2948: 2944: 2941: 2938: 2933: 2929: 2921: 2917: 2912: 2885: 2880: 2876: 2872: 2869: 2866: 2863: 2857: 2851: 2848: 2845: 2825: 2820: 2816: 2812: 2809: 2806: 2803: 2783: 2779: 2773: 2769: 2765: 2743: 2739: 2735: 2715: 2693: 2689: 2668: 2663: 2659: 2655: 2652: 2649: 2644: 2640: 2619: 2593: 2589: 2568: 2548: 2526: 2522: 2507:log-likelihood 2502: 2499: 2486: 2483: 2480: 2477: 2453: 2431: 2427: 2406: 2401: 2397: 2393: 2390: 2370: 2350: 2347: 2344: 2341: 2321: 2299: 2294: 2290: 2286: 2283: 2280: 2275: 2271: 2265: 2262: 2255: 2250: 2247: 2244: 2240: 2236: 2233: 2230: 2227: 2224: 2221: 2218: 2215: 2192: 2172: 2152: 2128: 2116: 2113: 2101: 2098: 2095: 2092: 2089: 2086: 2083: 2080: 2077: 2074: 2071: 2068: 2065: 2060: 2056: 2051: 2048: 2045: 2042: 2037: 2033: 2029: 2026: 2023: 2018: 2014: 2010: 2007: 2004: 1999: 1995: 1990: 1985: 1981: 1977: 1974: 1967: 1963: 1958: 1954: 1951: 1947: 1942: 1939: 1936: 1933: 1929: 1924: 1920: 1915: 1911: 1906: 1902: 1898: 1895: 1891: 1885: 1882: 1879: 1876: 1873: 1867: 1864: 1861: 1858: 1854: 1851: 1845: 1841: 1836: 1832: 1828: 1825: 1822: 1819: 1816: 1813: 1808: 1804: 1781: 1761: 1758: 1738: 1718: 1698: 1676: 1672: 1649: 1645: 1624: 1619: 1615: 1611: 1608: 1605: 1600: 1596: 1592: 1568: 1564: 1558: 1553: 1550: 1545: 1540: 1537: 1532: 1528: 1524: 1521: 1501: 1496: 1492: 1488: 1485: 1482: 1477: 1473: 1469: 1447: 1443: 1422: 1419: 1406: 1386: 1362: 1359: 1356: 1353: 1350: 1347: 1319: 1316: 1312: 1307: 1304: 1301: 1298: 1295: 1292: 1288: 1285: 1282: 1279: 1273: 1268: 1264: 1261: 1258: 1255: 1252: 1249: 1246: 1243: 1233: 1231:and therefore 1218: 1215: 1212: 1209: 1206: 1203: 1200: 1197: 1192: 1188: 1184: 1181: 1177: 1172: 1169: 1166: 1163: 1160: 1157: 1153: 1150: 1147: 1144: 1138: 1133: 1129: 1107: 1087: 1067: 1047: 1027: 996: 976: 949: 929: 897: 894: 891: 888: 885: 882: 879: 875: 872: 869: 866: 859: 854: 851: 847: 843: 840: 837: 834: 831: 828: 825: 822: 812: 797: 783:with the same 772: 752: 725: 701: 698: 695: 692: 670: 667: 664: 661: 658: 655: 649: 646: 641: 637: 634: 631: 628: 625: 622: 619: 616: 613: 610: 607: 604: 582: 562: 538: 518: 498: 495: 492: 489: 486: 480: 477: 472: 445: 434:expected value 421: 418: 415: 410: 406: 383: 380: 377: 374: 371: 368: 365: 360: 356: 352: 349: 346: 343: 340: 337: 334: 331: 309: 289: 277: 274: 261: 241: 217: 197: 173: 172: 170: 169: 162: 155: 147: 144: 143: 142: 141: 136: 131: 126: 118: 117: 116: 115: 110: 102: 101: 100: 99: 94: 89: 84: 79: 74: 69: 64: 59: 54: 46: 45: 37: 36: 26: 24: 14: 13: 10: 9: 6: 4: 3: 2: 7754: 7743: 7740: 7738: 7735: 7734: 7732: 7722: 7718: 7714: 7713: 7709: 7699: 7694: 7687: 7684: 7679: 7673: 7669: 7665: 7661: 7653: 7650: 7645: 7639: 7635: 7628: 7625: 7619: 7614: 7606: 7603: 7600: 7595: 7592: 7588: 7583: 7580: 7577: 7573: 7568: 7565: 7562: 7556: 7553: 7547: 7544: 7538: 7535: 7528: 7524: 7521: 7519: 7516: 7514: 7511: 7509: 7506: 7504: 7501: 7499: 7496: 7495: 7491: 7489: 7475: 7472: 7469: 7449: 7446: 7443: 7423: 7403: 7381: 7378: 7374: 7351: 7347: 7324: 7321: 7317: 7294: 7290: 7264: 7260: 7256: 7251: 7247: 7240: 7235: 7232: 7229: 7225: 7219: 7216: 7211: 7203: 7199: 7195: 7192: 7186: 7183: 7178: 7174: 7153: 7141: 7139: 7136: 7123: 7115: 7111: 7107: 7102: 7092: 7080: 7077: 7073: 7067: 7062: 7059: 7056: 7052: 7048: 7040: 7030: 7023: 7018: 7014: 7005: 7002: 6998: 6992: 6987: 6984: 6981: 6977: 6973: 6970: 6956: 6948: 6944: 6915: 6905: 6901: 6896: 6892: 6885: 6882: 6878: 6872: 6868: 6863: 6853: 6850: 6846: 6840: 6836: 6831: 6825: 6822: 6818: 6814: 6808: 6804: 6793: 6789: 6785: 6780: 6777: 6773: 6767: 6763: 6759: 6755: 6751: 6748: 6744: 6739: 6736: 6732: 6728: 6725: 6717: 6713: 6684: 6674: 6670: 6665: 6661: 6654: 6651: 6647: 6641: 6637: 6632: 6622: 6618: 6613: 6607: 6604: 6600: 6593: 6583: 6579: 6575: 6570: 6567: 6563: 6557: 6553: 6549: 6545: 6541: 6538: 6534: 6529: 6526: 6518: 6514: 6481: 6473: 6469: 6465: 6460: 6450: 6438: 6433: 6430: 6427: 6423: 6419: 6415: 6409: 6399: 6392: 6387: 6383: 6378: 6372: 6367: 6364: 6361: 6357: 6353: 6350: 6348: 6339: 6328: 6324: 6320: 6315: 6311: 6307: 6303: 6299: 6296: 6292: 6282: 6278: 6274: 6271: 6265: 6255: 6251: 6247: 6242: 6238: 6234: 6230: 6226: 6223: 6214: 6210: 6206: 6201: 6197: 6193: 6189: 6185: 6180: 6176: 6168: 6162: 6157: 6154: 6151: 6147: 6143: 6140: 6138: 6122: 6114: 6110: 6077: 6067: 6063: 6059: 6054: 6050: 6046: 6042: 6038: 6035: 6030: 6027: 6021: 6017: 6006: 6002: 5998: 5993: 5989: 5985: 5981: 5977: 5974: 5970: 5965: 5962: 5958: 5954: 5951: 5943: 5939: 5910: 5900: 5896: 5892: 5887: 5883: 5879: 5875: 5871: 5868: 5860: 5856: 5852: 5847: 5843: 5839: 5835: 5829: 5819: 5815: 5811: 5806: 5802: 5798: 5794: 5790: 5787: 5783: 5778: 5775: 5767: 5763: 5745: 5729: 5719: 5706: 5693: 5687: 5684: 5675: 5664: 5660: 5656: 5642: 5617: 5614: 5601: 5597: 5588: 5578: 5571: 5568: 5562: 5559: 5551: 5547: 5543: 5540: 5534: 5529: 5519: 5512: 5509: 5504: 5500: 5495: 5489: 5484: 5481: 5478: 5474: 5470: 5467: 5453: 5434: 5423: 5420: 5416: 5410: 5406: 5402: 5399: 5396: 5391: 5388: 5384: 5378: 5374: 5370: 5365: 5361: 5357: 5351: 5348: 5345: 5342: 5338: 5333: 5325: 5322: 5318: 5314: 5311: 5308: 5303: 5300: 5296: 5283: 5277: 5266: 5262: 5239: 5231: 5228: 5225: 5219: 5216: 5206: 5201: 5193: 5190: 5186: 5180: 5173: 5170: 5166: 5160: 5153: 5147: 5142: 5133: 5130: 5126: 5120: 5113: 5109: 5103: 5094: 5091: 5087: 5081: 5074: 5070: 5064: 5058: 5053: 5042: 5033: 5031: 5027: 5023: 5017: 5000: 4994: 4967: 4957: 4949: 4945: 4941: 4938: 4931: 4927: 4924: 4916: 4906: 4895: 4892: 4887: 4877: 4864: 4851: 4835: 4825: 4818: 4815: 4809: 4806: 4798: 4794: 4790: 4787: 4781: 4776: 4766: 4759: 4756: 4751: 4747: 4731: 4726: 4723: 4720: 4716: 4710: 4707: 4702: 4696: 4685: 4681: 4677: 4672: 4668: 4661: 4656: 4651: 4648: 4645: 4641: 4635: 4632: 4624: 4607: 4599: 4597: 4581: 4578: 4575: 4572: 4569: 4566: 4563: 4543: 4526: 4522: 4521: 4520: 4507: 4495: 4489: 4486: 4480: 4477: 4471: 4468: 4465: 4459: 4450: 4444: 4441: 4438: 4435: 4429: 4421: 4417: 4413: 4410: 4405: 4401: 4395: 4391: 4387: 4381: 4372: 4369: 4366: 4360: 4340: 4320: 4291: 4285: 4282: 4279: 4270: 4261: 4258: 4235: 4232: 4229: 4226: 4223: 4217: 4214: 4205: 4192: 4183: 4177: 4174: 4171: 4166: 4163: 4160: 4156: 4135: 4132: 4129: 4121: 4083: 4070: 4062: 4058: 4054: 4051: 4047: 4042: 4031: 4020: 4017: 4008: 4002: 3997: 3994: 3991: 3987: 3966: 3963: 3960: 3940: 3920: 3895: 3892: 3888: 3884: 3881: 3874: 3870: 3867: 3861: 3855: 3848: 3844: 3828: 3808: 3788: 3780: 3775: 3773: 3772: 3771:logistic loss 3767: 3763: 3745: 3741: 3718: 3714: 3705: 3701: 3693: 3691: 3674: 3671: 3668: 3662: 3654: 3635: 3632: 3629: 3623: 3600: 3597: 3594: 3577: 3568: 3552: 3532: 3512: 3504: 3503: 3497: 3495: 3491: 3487: 3468: 3440: 3420: 3417: 3414: 3406: 3390: 3382: 3381:KL divergence 3366: 3346: 3336: 3328: 3326: 3324: 3320: 3304: 3296: 3292: 3288: 3285: 3279: 3276: 3273: 3270: 3267: 3254: 3251: 3238: 3235: 3208: 3204: 3200: 3197: 3191: 3188: 3185: 3182: 3170: 3165: 3162: 3158: 3154: 3151: 3144: 3140: 3126: 3122: 3118: 3115: 3107: 3103: 3095: 3091: 3086: 3082: 3074: 3070: 3066: 3063: 3055: 3051: 3045: 3041: 3037: 3024: 3021: 3003: 2980: 2976: 2972: 2969: 2963: 2960: 2950: 2946: 2942: 2939: 2931: 2927: 2919: 2915: 2910: 2901: 2878: 2874: 2870: 2867: 2861: 2849: 2846: 2843: 2818: 2814: 2810: 2807: 2801: 2781: 2777: 2771: 2767: 2741: 2737: 2713: 2691: 2687: 2661: 2657: 2653: 2650: 2642: 2638: 2617: 2609: 2591: 2587: 2566: 2546: 2524: 2520: 2510: 2508: 2500: 2498: 2481: 2475: 2467: 2451: 2429: 2425: 2399: 2395: 2388: 2368: 2345: 2339: 2319: 2310: 2292: 2288: 2281: 2278: 2273: 2269: 2263: 2260: 2253: 2248: 2245: 2242: 2238: 2234: 2231: 2225: 2222: 2219: 2213: 2205: 2190: 2170: 2150: 2142: 2126: 2114: 2112: 2099: 2093: 2090: 2087: 2081: 2078: 2072: 2066: 2063: 2058: 2054: 2046: 2040: 2035: 2031: 2027: 2024: 2016: 2012: 2005: 2002: 1997: 1993: 1983: 1979: 1972: 1965: 1961: 1956: 1952: 1949: 1945: 1937: 1931: 1927: 1922: 1918: 1913: 1909: 1904: 1896: 1893: 1889: 1880: 1874: 1871: 1862: 1856: 1852: 1849: 1843: 1839: 1834: 1826: 1823: 1817: 1811: 1806: 1793: 1779: 1759: 1756: 1736: 1716: 1696: 1674: 1670: 1647: 1643: 1617: 1613: 1609: 1606: 1603: 1598: 1594: 1566: 1562: 1556: 1551: 1548: 1543: 1538: 1530: 1526: 1519: 1494: 1490: 1486: 1483: 1480: 1475: 1471: 1445: 1441: 1432: 1428: 1420: 1418: 1404: 1384: 1376: 1375:joint entropy 1357: 1354: 1351: 1345: 1335: 1317: 1314: 1302: 1296: 1293: 1290: 1283: 1277: 1266: 1262: 1259: 1253: 1250: 1247: 1241: 1232: 1229: 1216: 1210: 1207: 1204: 1201: 1195: 1190: 1182: 1179: 1167: 1161: 1158: 1155: 1148: 1142: 1131: 1127: 1119: 1105: 1085: 1065: 1045: 1025: 1017: 1014: 1010: 994: 974: 967: 963: 947: 927: 919: 913: 895: 889: 883: 880: 877: 870: 864: 852: 849: 845: 841: 838: 832: 829: 826: 820: 811: 810:, this means 786: 770: 750: 742: 737: 723: 715: 696: 690: 681: 668: 662: 659: 656: 639: 635: 629: 623: 620: 614: 611: 608: 602: 594: 580: 560: 552: 536: 516: 493: 490: 487: 470: 462: 457: 443: 435: 416: 408: 404: 394: 381: 375: 372: 369: 363: 358: 350: 347: 341: 338: 335: 329: 321: 307: 287: 275: 273: 259: 239: 231: 215: 195: 188: 184: 183:cross-entropy 180: 168: 163: 161: 156: 154: 149: 148: 146: 145: 140: 137: 135: 132: 130: 127: 125: 122: 121: 120: 119: 114: 111: 109: 106: 105: 104: 103: 98: 95: 93: 90: 88: 85: 83: 80: 78: 75: 73: 70: 68: 67:Joint entropy 65: 63: 60: 58: 55: 53: 50: 49: 48: 47: 43: 39: 38: 35: 31: 19: 7720: 7686: 7659: 7652: 7633: 7627: 7605: 7594: 7586: 7582: 7571: 7567: 7555: 7546: 7537: 7396:classifier, 7339:classifier, 7145: 7137: 5746: 5707: 5618: 5615: 5034: 5025: 5024: 5018: 4865: 4600: 4595: 4534: 4206: 3979:is given by 3776: 3769: 3765: 3761: 3704:optimization 3697: 3566: 3500: 3498: 3493: 3489: 3338: 2511: 2504: 2311: 2206: 2118: 1794: 1424: 1337: 1234: 1230: 1120: 915: 813: 738: 682: 595: 550: 458: 395: 322: 279: 185:between two 182: 176: 92:Entropy rate 3433:, which is 7731:Categories 7698:2007.08140 7618:2303.09935 7529:References 5744:, we have 4529:incorrect. 3492:(MCE), or 2900:perplexity 2509:function. 2115:Estimation 1421:Motivation 918:continuous 276:Definition 7470:λ 7444:λ 7424:λ 7233:≠ 7226:∑ 7217:λ 7212:− 7108:− 7096:^ 7053:∑ 7034:^ 7024:− 6978:∑ 6974:− 6964:β 6945:β 6941:∂ 6937:∂ 6869:β 6837:β 6815:− 6764:β 6760:− 6740:− 6729:⁡ 6714:β 6710:∂ 6706:∂ 6638:β 6554:β 6550:− 6530:⁡ 6515:β 6511:∂ 6507:∂ 6466:− 6454:^ 6424:∑ 6403:^ 6393:− 6358:∑ 6354:− 6312:β 6308:− 6275:− 6266:− 6239:β 6235:− 6198:β 6194:− 6186:⋅ 6148:∑ 6144:− 6130:β 6111:β 6107:∂ 6103:∂ 6051:β 6047:− 6028:− 5990:β 5986:− 5966:− 5955:⁡ 5940:β 5936:∂ 5932:∂ 5884:β 5880:− 5844:β 5840:− 5803:β 5799:− 5779:⁡ 5764:β 5760:∂ 5756:∂ 5723:^ 5685:− 5679:^ 5650:β 5636:β 5632:∂ 5628:∂ 5582:^ 5572:− 5563:⁡ 5544:− 5523:^ 5513:⁡ 5475:∑ 5471:− 5461:β 5407:β 5403:− 5400:⋯ 5397:− 5375:β 5371:− 5362:β 5358:− 5352:⁡ 5312:… 5287:^ 5272:^ 5220:× 5207:∈ 5181:⋯ 5154:⋮ 5148:⋮ 5143:⋮ 5121:⋯ 5082:… 4958:⋅ 4950:− 4907:⋅ 4893:≡ 4881:^ 4829:^ 4819:− 4810:⁡ 4791:− 4770:^ 4760:⁡ 4717:∑ 4703:− 4642:∑ 4576:… 4499:^ 4490:− 4481:⁡ 4469:− 4460:− 4454:^ 4445:⁡ 4436:− 4414:⁡ 4392:∑ 4388:− 4295:^ 4286:− 4274:^ 4262:∈ 4233:− 4218:∈ 4187:^ 4178:− 4071:⋅ 4063:− 4032:⋅ 4018:≡ 4012:^ 3893:− 3598:∥ 3297:θ 3277:⋅ 3271:− 3252:θ 3239:⁡ 3209:θ 3189:⋅ 3183:− 3163:− 3137:# 3108:θ 3087:∏ 3056:θ 3042:∏ 3022:θ 2961:− 2932:θ 2911:∏ 2879:θ 2764:# 2734:# 2643:θ 2618:θ 2547:θ 2525:θ 2279:⁡ 2239:∑ 2235:− 2064:⁡ 2032:∑ 2028:− 2003:⁡ 1957:∑ 1953:− 1928:⁡ 1910:⁡ 1897:− 1875:⁡ 1853:⁡ 1840:⁡ 1827:− 1818:ℓ 1812:⁡ 1644:ℓ 1607:… 1563:ℓ 1484:… 1294:⁡ 1267:∫ 1263:− 1208:⁡ 1202:− 1196:⁡ 1159:⁡ 1132:∫ 1128:− 1016:Ļƒ-algebra 987:(usually 881:⁡ 853:∈ 846:∑ 842:− 660:∥ 491:∥ 417:⋅ 373:⁡ 364:⁡ 351:− 7492:See also 3762:log loss 1749:and not 1635:, where 741:discrete 7636:. MIT. 5026:Remark: 4987:, with 4596:average 3494:Minxent 3000:by the 2579:values 2539:, with 1118:. Then 1018:). Let 966:measure 785:support 714:entropy 712:is the 432:is the 52:Entropy 7674:  7640:  7282:where 4866:where 4737:  4700:  4694:  4628:  4622:  4594:. The 4433:  4427:  4385:  4379:  3913:where 2312:where 1429:, the 683:where 396:where 181:, the 7693:arXiv 7613:arXiv 7576:p. 82 1583:over 1013:Borel 1011:on a 1007:is a 529:from 7672:ISBN 7638:ISBN 4333:and 4251:and 3801:and 3764:(or 3702:and 2898:the 1397:and 1332:Eq.2 1078:and 1038:and 960:are 940:and 910:Eq.1 763:and 739:For 230:bits 208:and 7664:doi 5560:log 5510:log 5349:exp 4807:log 4757:log 4478:log 4442:log 4411:log 3768:or 3567:not 3236:log 3228:or 2270:log 2055:log 1994:log 1919:log 1792:is 1425:In 1377:of 1291:log 1205:log 1156:log 878:log 716:of 593:). 553:of 370:log 177:In 7733:: 7719:. 7670:. 6726:ln 6527:ln 5952:ln 5776:ln 5114:21 5075:11 4353:: 3496:. 2850::= 2497:. 1872:ln 1850:ln 1417:. 1334:) 912:) 736:. 456:. 272:. 7701:. 7695:: 7680:. 7666:: 7646:. 7621:. 7615:: 7476:1 7473:= 7450:0 7447:= 7404:p 7382:h 7379:t 7375:k 7352:k 7348:q 7325:h 7322:t 7318:k 7295:k 7291:e 7270:) 7265:k 7261:q 7257:, 7252:j 7248:q 7244:( 7241:H 7236:k 7230:j 7220:K 7209:) 7204:k 7200:q 7196:, 7193:p 7190:( 7187:H 7184:= 7179:k 7175:e 7154:K 7124:. 7121:) 7116:i 7112:y 7103:i 7093:y 7086:( 7081:1 7078:i 7074:x 7068:N 7063:1 7060:= 7057:i 7049:= 7046:) 7041:i 7031:y 7019:i 7015:y 7011:( 7006:1 7003:i 6999:x 6993:N 6988:1 6985:= 6982:i 6971:= 6968:) 6960:( 6957:L 6949:1 6916:, 6906:1 6902:k 6897:e 6893:+ 6886:1 6883:i 6879:x 6873:1 6864:e 6854:1 6851:i 6847:x 6841:1 6832:e 6826:1 6823:i 6819:x 6809:= 6805:] 6794:1 6790:k 6786:+ 6781:1 6778:i 6774:x 6768:1 6756:e 6752:+ 6749:1 6745:1 6737:1 6733:[ 6718:1 6685:, 6675:1 6671:k 6666:e 6662:+ 6655:1 6652:i 6648:x 6642:1 6633:e 6623:1 6619:k 6614:e 6608:1 6605:i 6601:x 6594:= 6584:1 6580:k 6576:+ 6571:1 6568:i 6564:x 6558:1 6546:e 6542:+ 6539:1 6535:1 6519:1 6482:, 6479:) 6474:i 6470:y 6461:i 6451:y 6444:( 6439:N 6434:1 6431:= 6428:i 6420:= 6416:] 6410:i 6400:y 6388:i 6384:y 6379:[ 6373:N 6368:1 6365:= 6362:i 6351:= 6340:] 6329:0 6325:k 6321:+ 6316:0 6304:e 6300:+ 6297:1 6293:1 6288:) 6283:i 6279:y 6272:1 6269:( 6256:0 6252:k 6248:+ 6243:0 6231:e 6227:+ 6224:1 6215:0 6211:k 6207:+ 6202:0 6190:e 6181:i 6177:y 6169:[ 6163:N 6158:1 6155:= 6152:i 6141:= 6134:) 6126:( 6123:L 6115:0 6078:, 6068:0 6064:k 6060:+ 6055:0 6043:e 6039:+ 6036:1 6031:1 6022:= 6018:) 6007:0 6003:k 5999:+ 5994:0 5982:e 5978:+ 5975:1 5971:1 5963:1 5959:( 5944:0 5911:, 5901:0 5897:k 5893:+ 5888:0 5876:e 5872:+ 5869:1 5861:0 5857:k 5853:+ 5848:0 5836:e 5830:= 5820:0 5816:k 5812:+ 5807:0 5795:e 5791:+ 5788:1 5784:1 5768:0 5730:i 5720:y 5694:. 5691:) 5688:Y 5676:Y 5670:( 5665:T 5661:X 5657:= 5654:) 5646:( 5643:L 5602:. 5598:] 5594:) 5589:i 5579:y 5569:1 5566:( 5557:) 5552:i 5548:y 5541:1 5538:( 5535:+ 5530:i 5520:y 5505:i 5501:y 5496:[ 5490:N 5485:1 5482:= 5479:i 5468:= 5465:) 5457:( 5454:L 5435:, 5429:) 5424:p 5421:i 5417:x 5411:p 5392:1 5389:i 5385:x 5379:1 5366:0 5355:( 5346:+ 5343:1 5339:1 5334:= 5331:) 5326:p 5323:i 5319:x 5315:, 5309:, 5304:1 5301:i 5297:x 5293:( 5284:f 5278:= 5267:i 5263:y 5240:, 5235:) 5232:1 5229:+ 5226:p 5223:( 5217:n 5212:R 5202:) 5194:p 5191:n 5187:x 5174:1 5171:n 5167:x 5161:1 5134:p 5131:2 5127:x 5110:x 5104:1 5095:p 5092:1 5088:x 5071:x 5065:1 5059:( 5054:= 5048:T 5043:X 5004:) 5001:z 4998:( 4995:g 4975:) 4968:n 4963:x 4954:w 4946:e 4942:+ 4939:1 4936:( 4932:/ 4928:1 4925:= 4922:) 4917:n 4912:x 4903:w 4899:( 4896:g 4888:n 4878:y 4852:, 4846:] 4841:) 4836:n 4826:y 4816:1 4813:( 4804:) 4799:n 4795:y 4788:1 4785:( 4782:+ 4777:n 4767:y 4752:n 4748:y 4742:[ 4732:N 4727:1 4724:= 4721:n 4711:N 4708:1 4697:= 4691:) 4686:n 4682:q 4678:, 4673:n 4669:p 4665:( 4662:H 4657:N 4652:1 4649:= 4646:n 4636:N 4633:1 4625:= 4619:) 4615:w 4611:( 4608:J 4582:N 4579:, 4573:, 4570:1 4567:= 4564:n 4544:N 4508:. 4505:) 4496:y 4487:1 4484:( 4475:) 4472:y 4466:1 4463:( 4451:y 4439:y 4430:= 4422:i 4418:q 4406:i 4402:p 4396:i 4382:= 4376:) 4373:q 4370:, 4367:p 4364:( 4361:H 4341:q 4321:p 4301:} 4292:y 4283:1 4280:, 4271:y 4265:{ 4259:q 4239:} 4236:y 4230:1 4227:, 4224:y 4221:{ 4215:p 4193:. 4184:y 4175:1 4172:= 4167:0 4164:= 4161:y 4157:q 4136:0 4133:= 4130:y 4105:w 4084:, 4075:x 4067:w 4059:e 4055:+ 4052:1 4048:1 4043:= 4040:) 4036:x 4028:w 4024:( 4021:g 4009:y 4003:= 3998:1 3995:= 3992:y 3988:q 3967:1 3964:= 3961:y 3941:x 3921:z 3901:) 3896:z 3889:e 3885:+ 3882:1 3879:( 3875:/ 3871:1 3868:= 3865:) 3862:z 3859:( 3856:g 3829:x 3809:1 3789:0 3746:i 3742:q 3719:i 3715:p 3678:) 3675:q 3672:, 3669:p 3666:( 3663:H 3639:) 3636:q 3633:, 3630:p 3627:( 3624:H 3604:) 3601:q 3595:p 3592:( 3586:L 3583:K 3578:D 3553:q 3533:p 3513:q 3472:) 3469:p 3466:( 3462:H 3441:0 3421:q 3418:= 3415:p 3391:p 3367:p 3347:q 3305:. 3302:) 3293:q 3289:, 3286:p 3283:( 3280:H 3274:N 3268:= 3265:) 3260:x 3255:; 3249:( 3244:L 3214:) 3205:q 3201:, 3198:p 3195:( 3192:H 3186:N 3177:e 3171:= 3166:N 3159:P 3155:P 3152:= 3145:i 3141:x 3133:) 3127:i 3123:x 3119:= 3116:X 3113:( 3104:q 3096:i 3092:x 3083:= 3080:) 3075:i 3071:x 3067:= 3064:X 3061:( 3052:q 3046:i 3038:= 3035:) 3030:x 3025:; 3019:( 3014:L 2986:) 2981:i 2977:x 2973:= 2970:X 2967:( 2964:p 2957:) 2951:i 2947:x 2943:= 2940:X 2937:( 2928:q 2920:i 2916:x 2884:) 2875:q 2871:, 2868:p 2865:( 2862:H 2856:e 2847:P 2844:P 2824:) 2819:i 2815:x 2811:= 2808:X 2805:( 2802:p 2782:N 2778:/ 2772:i 2768:x 2742:i 2738:x 2714:i 2692:i 2688:x 2667:) 2662:i 2658:x 2654:= 2651:X 2648:( 2639:q 2592:i 2588:x 2567:N 2521:q 2485:) 2482:x 2479:( 2476:p 2452:N 2430:i 2426:x 2405:) 2400:i 2396:x 2392:( 2389:q 2369:x 2349:) 2346:x 2343:( 2340:q 2320:N 2298:) 2293:i 2289:x 2285:( 2282:q 2274:2 2264:N 2261:1 2254:N 2249:1 2246:= 2243:i 2232:= 2229:) 2226:q 2223:, 2220:T 2217:( 2214:H 2191:q 2171:p 2151:T 2127:p 2100:. 2097:) 2094:q 2091:, 2088:p 2085:( 2082:H 2079:= 2076:) 2073:x 2070:( 2067:q 2059:2 2050:) 2047:x 2044:( 2041:p 2036:x 2025:= 2022:) 2017:i 2013:x 2009:( 2006:q 1998:2 1989:) 1984:i 1980:x 1976:( 1973:p 1966:i 1962:x 1950:= 1946:] 1941:) 1938:x 1935:( 1932:q 1923:2 1914:[ 1905:p 1901:E 1894:= 1890:] 1884:) 1881:2 1878:( 1866:) 1863:x 1860:( 1857:q 1844:[ 1835:p 1831:E 1824:= 1821:] 1815:[ 1807:p 1803:E 1780:p 1760:. 1757:q 1737:p 1717:p 1697:q 1675:i 1671:x 1648:i 1623:} 1618:n 1614:x 1610:, 1604:, 1599:1 1595:x 1591:{ 1567:i 1557:) 1552:2 1549:1 1544:( 1539:= 1536:) 1531:i 1527:x 1523:( 1520:q 1500:} 1495:n 1491:x 1487:, 1481:, 1476:1 1472:x 1468:{ 1446:i 1442:x 1405:q 1385:p 1361:) 1358:q 1355:, 1352:p 1349:( 1346:H 1318:. 1315:x 1311:d 1306:) 1303:x 1300:( 1297:Q 1287:) 1284:x 1281:( 1278:P 1272:X 1260:= 1257:) 1254:q 1251:, 1248:p 1245:( 1242:H 1217:, 1214:] 1211:Q 1199:[ 1191:p 1187:E 1183:= 1180:x 1176:d 1171:) 1168:x 1165:( 1162:Q 1152:) 1149:x 1146:( 1143:P 1137:X 1106:r 1086:q 1066:p 1046:Q 1026:P 995:r 975:r 948:q 928:p 896:. 893:) 890:x 887:( 884:q 874:) 871:x 868:( 865:p 858:X 850:x 839:= 836:) 833:q 830:, 827:p 824:( 821:H 796:X 771:q 751:p 724:p 700:) 697:p 694:( 691:H 669:, 666:) 663:q 657:p 654:( 648:L 645:K 640:D 636:+ 633:) 630:p 627:( 624:H 621:= 618:) 615:q 612:, 609:p 606:( 603:H 581:q 561:p 537:q 517:p 497:) 494:q 488:p 485:( 479:L 476:K 471:D 444:p 420:] 414:[ 409:p 405:E 382:, 379:] 376:q 367:[ 359:p 355:E 348:= 345:) 342:q 339:, 336:p 333:( 330:H 308:p 288:q 260:p 240:q 216:q 196:p 166:e 159:t 152:v 20:)

Index

Cross entropy loss
Information theory

Entropy
Differential entropy
Conditional entropy
Joint entropy
Mutual information
Directed information
Conditional mutual information
Relative entropy
Entropy rate
Limiting density of discrete points
Asymptotic equipartition property
Rateā€“distortion theory
Shannon's source coding theorem
Channel capacity
Noisy-channel coding theorem
Shannonā€“Hartley theorem
v
t
e
information theory
probability distributions
bits
expected value
Kullbackā€“Leibler divergence
entropy
discrete
support

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

ā†‘