Knowledge

Log probability

Source 📝

1234: 819: 803:
However, in many applications a multiplication of probabilities (giving the probability of all independent events occurring) is used more often than their addition (giving the probability of at least one of mutually exclusive events occurring). Additionally, the cost of computing the addition can be
188:
than addition, taking the product of a high number of probabilities is often faster if they are represented in log form. (The conversion to log form is expensive, but is only incurred once.) Multiplication arises from calculating the probability that multiple independent events occur: the
1229:{\displaystyle {\begin{aligned}&\log(x+y)\\={}&\log(x+x\cdot y/x)\\={}&\log(x+x\cdot \exp(\log(y/x)))\\={}&\log(x\cdot (1+\exp(\log(y)-\log(x))))\\={}&\log(x)+\log(1+\exp(\log(y)-\log(x)))\\={}&x'+\log \left(1+\exp \left(y'-x'\right)\right)\end{aligned}}} 1442: 1552: 387: 140:
multiply, and logarithms convert multiplication to addition, log probabilities of independent events add. Log probabilities are thus practical for computations, and have an intuitive interpretation in terms of
766: 1303: 804:
avoided in some situations by simply using the highest probability as an approximation. Since probabilities are non-negative this gives a lower bound. This approximation is used in reverse to get a
824: 635: 583: 291: 213:
have an exponential form. Taking the log of these distributions eliminates the exponential function, unwrapping the exponent. For example, the log probability of the normal distribution's
165:. The log probability is widely used in implementations of computations with probability, and is studied as a concept in its own right in some applications of information theory, such as 93: 1643: 1330: 1602: 1579: 664: 529: 504: 458: 1345: 798: 125: 428:
The logarithm function is not defined for zero, so log probabilities can only represent non-zero probabilities. Since the logarithm of a number in
1457: 460:
interval is negative, often the negative log probabilities are used. In that case the log probabilities in the following formulas would be
296: 1765: 1718: 134: 404: 1332:
should be the larger (least negative) of the two operands. This also produces the correct behavior if one of the operands is
1686: 672: 214: 166: 1242: 800:
is a bit more involved to compute in logarithmic space, requiring the computation of one exponent and one logarithm.
220: 137: 1333: 416: 589: 537: 396: 210: 1558:
The above formula alone will incorrectly produce an indeterminate result in the case where both arguments are
189:
probability that all independent events of interest occur is the product of all these events' probabilities.
1760: 1658: 196: 150: 1445: 412: 400: 142: 38: 63: 1736:"Why we always put log() before the joint pdf when we use MLE (Maximum likelihood Estimation)?" 1610: 1714: 1646: 1584: 1561: 1336: 643: 162: 58: 1708: 1437:{\displaystyle -\infty +\log \left(1+\exp \left(y'-(-\infty )\right)\right)=-\infty +\infty } 461: 408: 42: 431: 1308: 185: 777: 509: 484: 1740: 1663: 200: 158: 146: 98: 1754: 127: 476: 419:
of a function such as probability, optimizers work better with log probabilities.
54: 199:, when the probabilities are very small, because of the way in which computers 17: 154: 805: 50: 389:. Log probabilities make some mathematical manipulations easier to perform. 1735: 1547:{\displaystyle x'+\log \left(1+\exp \left(-\infty -x'\right)\right)=x'+0} 1305:, provided one takes advantage of the asymmetry in the addition formula. 177:
Representing probabilities in this way has several practical advantages:
31: 57:. The use of log probabilities means representing probabilities on a 382:{\displaystyle C_{2}\exp \left(-((x-m_{x})/\sigma _{m})^{2}\right)} 481:
In this section we would name probabilities in logarithmic space
1607:
For numerical reasons, one should use a function that computes
1449: 161:
can be interpreted as the degree to which an event supports a
157:
are often transformed to the log scale, and the corresponding
1687:"Probability for Computer scientists - Log probabilities" 1613: 1587: 1564: 1460: 1348: 1311: 1245: 822: 780: 761:{\displaystyle \log(x\cdot y)=\log(x)+\log(y)=x'+y'.} 675: 646: 592: 540: 512: 487: 434: 299: 223: 101: 66: 27:
Logarithm of probabilities, useful for calculations
1637: 1596: 1581:. This should be checked for separately to return 1573: 1546: 1436: 1324: 1297: 1228: 792: 760: 658: 629: 577: 523: 498: 452: 381: 285: 119: 87: 73: 1713:. New York: John Wiley & Sons. p. 14. 1710:Geometrical Foundations of Asymptotic Inference 1298:{\displaystyle \log \left(e^{x'}+e^{y'}\right)} 1339:, which corresponds to a probability of zero. 666:corresponds to addition in logarithmic space. 286:{\displaystyle -((x-m_{x})/\sigma _{m})^{2}+C} 8: 1680: 1678: 806:continuous approximation of the max function 467:Any base can be selected for the logarithm. 1734:Papadopoulos, Alecos (September 25, 2013). 772: 630:{\displaystyle y'=\log(y)\in \mathbb {R} } 578:{\displaystyle x'=\log(x)\in \mathbb {R} } 1612: 1586: 1563: 1459: 1347: 1312: 1310: 1279: 1261: 1244: 1150: 1058: 972: 948: 902: 884: 856: 823: 821: 779: 674: 645: 623: 622: 591: 571: 570: 539: 511: 486: 433: 368: 358: 349: 340: 304: 298: 271: 261: 252: 243: 222: 100: 65: 1239:The formula above is more accurate than 1674: 1707:Kass, Robert E.; Vos, Paul W. (1997). 195:The use of log probabilities improves 7: 1591: 1568: 1503: 1431: 1425: 1403: 1352: 25: 149:of the log probabilities is the 1632: 1620: 1406: 1397: 1140: 1137: 1134: 1128: 1116: 1110: 1101: 1086: 1074: 1068: 1048: 1045: 1042: 1039: 1033: 1021: 1015: 1006: 991: 982: 962: 959: 956: 942: 933: 912: 892: 866: 846: 834: 730: 724: 712: 706: 694: 682: 616: 610: 564: 558: 447: 435: 365: 346: 327: 324: 268: 249: 230: 227: 114: 102: 82: 67: 1: 640:The product of probabilities 184:Since multiplication is more 215:probability density function 1554:This is the desired answer. 167:natural language processing 133:Since the probabilities of 1782: 474: 95:, instead of the standard 88:{\displaystyle (-\inf ,0]} 29: 1638:{\displaystyle \log(1+x)} 397:probability distributions 211:probability distributions 1766:Mathematics of computing 1597:{\displaystyle -\infty } 1574:{\displaystyle -\infty } 659:{\displaystyle x\cdot y} 415:plays a key role in the 201:approximate real numbers 153:of an event. Similarly, 30:Not to be confused with 405:logarithmically concave 1639: 1598: 1575: 1548: 1438: 1326: 1299: 1230: 794: 762: 660: 631: 579: 525: 500: 454: 383: 287: 121: 89: 1640: 1599: 1576: 1549: 1448:, and will result in 1439: 1327: 1300: 1231: 812:Addition in log space 795: 763: 661: 632: 580: 526: 501: 475:Further information: 455: 453:{\displaystyle (0,1)} 424:Representation issues 384: 288: 122: 90: 1611: 1585: 1562: 1458: 1346: 1325:{\displaystyle {x'}} 1309: 1243: 820: 778: 773:sum of probabilities 673: 644: 590: 538: 510: 485: 432: 297: 221: 99: 64: 1659:Information content 793:{\displaystyle x+y} 471:Basic manipulations 197:numerical stability 151:information entropy 1635: 1594: 1571: 1544: 1434: 1322: 1295: 1226: 1224: 790: 758: 656: 627: 575: 524:{\displaystyle y'} 521: 499:{\displaystyle x'} 496: 450: 413:objective function 401:exponential family 395:Since most common 379: 283: 143:information theory 117: 85: 39:probability theory 1444:This quantity is 1337:negative infinity 163:statistical model 59:logarithmic scale 16:(Redirected from 1773: 1746: 1745: 1731: 1725: 1724: 1704: 1698: 1697: 1695: 1693: 1682: 1644: 1642: 1641: 1636: 1603: 1601: 1600: 1595: 1580: 1578: 1577: 1572: 1553: 1551: 1550: 1545: 1537: 1526: 1522: 1521: 1517: 1516: 1468: 1443: 1441: 1440: 1435: 1418: 1414: 1413: 1409: 1393: 1331: 1329: 1328: 1323: 1321: 1320: 1304: 1302: 1301: 1296: 1294: 1290: 1289: 1288: 1287: 1271: 1270: 1269: 1235: 1233: 1232: 1227: 1225: 1221: 1217: 1216: 1212: 1211: 1200: 1161: 1151: 1059: 973: 952: 903: 888: 857: 826: 799: 797: 796: 791: 767: 765: 764: 759: 754: 743: 665: 663: 662: 657: 636: 634: 633: 628: 626: 600: 584: 582: 581: 576: 574: 548: 530: 528: 527: 522: 520: 505: 503: 502: 497: 495: 459: 457: 456: 451: 388: 386: 385: 380: 378: 374: 373: 372: 363: 362: 353: 345: 344: 309: 308: 292: 290: 289: 284: 276: 275: 266: 265: 256: 248: 247: 126: 124: 123: 120:{\displaystyle } 118: 94: 92: 91: 86: 43:computer science 21: 1781: 1780: 1776: 1775: 1774: 1772: 1771: 1770: 1751: 1750: 1749: 1733: 1732: 1728: 1721: 1706: 1705: 1701: 1691: 1689: 1684: 1683: 1676: 1672: 1655: 1609: 1608: 1583: 1582: 1560: 1559: 1530: 1509: 1499: 1495: 1482: 1478: 1461: 1456: 1455: 1386: 1385: 1381: 1368: 1364: 1344: 1343: 1313: 1307: 1306: 1280: 1275: 1262: 1257: 1256: 1252: 1241: 1240: 1223: 1222: 1204: 1193: 1192: 1188: 1175: 1171: 1154: 1152: 1144: 1143: 1060: 1052: 1051: 974: 966: 965: 904: 896: 895: 858: 850: 849: 818: 817: 814: 776: 775: 747: 736: 671: 670: 642: 641: 593: 588: 587: 541: 536: 535: 513: 508: 507: 488: 483: 482: 479: 473: 430: 429: 426: 364: 354: 336: 320: 316: 300: 295: 294: 267: 257: 239: 219: 218: 175: 145:: the negative 97: 96: 62: 61: 47:log probability 35: 28: 23: 22: 18:Log-probability 15: 12: 11: 5: 1779: 1777: 1769: 1768: 1763: 1753: 1752: 1748: 1747: 1741:Stack Exchange 1726: 1719: 1699: 1685:Piech, Chris. 1673: 1671: 1668: 1667: 1666: 1664:Log-likelihood 1661: 1654: 1651: 1634: 1631: 1628: 1625: 1622: 1619: 1616: 1593: 1590: 1570: 1567: 1556: 1555: 1543: 1540: 1536: 1533: 1529: 1525: 1520: 1515: 1512: 1508: 1505: 1502: 1498: 1494: 1491: 1488: 1485: 1481: 1477: 1474: 1471: 1467: 1464: 1453: 1433: 1430: 1427: 1424: 1421: 1417: 1412: 1408: 1405: 1402: 1399: 1396: 1392: 1389: 1384: 1380: 1377: 1374: 1371: 1367: 1363: 1360: 1357: 1354: 1351: 1334:floating-point 1319: 1316: 1293: 1286: 1283: 1278: 1274: 1268: 1265: 1260: 1255: 1251: 1248: 1237: 1236: 1220: 1215: 1210: 1207: 1203: 1199: 1196: 1191: 1187: 1184: 1181: 1178: 1174: 1170: 1167: 1164: 1160: 1157: 1153: 1149: 1146: 1145: 1142: 1139: 1136: 1133: 1130: 1127: 1124: 1121: 1118: 1115: 1112: 1109: 1106: 1103: 1100: 1097: 1094: 1091: 1088: 1085: 1082: 1079: 1076: 1073: 1070: 1067: 1064: 1061: 1057: 1054: 1053: 1050: 1047: 1044: 1041: 1038: 1035: 1032: 1029: 1026: 1023: 1020: 1017: 1014: 1011: 1008: 1005: 1002: 999: 996: 993: 990: 987: 984: 981: 978: 975: 971: 968: 967: 964: 961: 958: 955: 951: 947: 944: 941: 938: 935: 932: 929: 926: 923: 920: 917: 914: 911: 908: 905: 901: 898: 897: 894: 891: 887: 883: 880: 877: 874: 871: 868: 865: 862: 859: 855: 852: 851: 848: 845: 842: 839: 836: 833: 830: 827: 825: 813: 810: 789: 786: 783: 769: 768: 757: 753: 750: 746: 742: 739: 735: 732: 729: 726: 723: 720: 717: 714: 711: 708: 705: 702: 699: 696: 693: 690: 687: 684: 681: 678: 655: 652: 649: 638: 637: 625: 621: 618: 615: 612: 609: 606: 603: 599: 596: 585: 573: 569: 566: 563: 560: 557: 554: 551: 547: 544: 519: 516: 494: 491: 472: 469: 449: 446: 443: 440: 437: 425: 422: 421: 420: 390: 377: 371: 367: 361: 357: 352: 348: 343: 339: 335: 332: 329: 326: 323: 319: 315: 312: 307: 303: 282: 279: 274: 270: 264: 260: 255: 251: 246: 242: 238: 235: 232: 229: 226: 204: 190: 174: 171: 159:log-likelihood 147:expected value 116: 113: 110: 107: 104: 84: 81: 78: 75: 72: 69: 26: 24: 14: 13: 10: 9: 6: 4: 3: 2: 1778: 1767: 1764: 1762: 1759: 1758: 1756: 1743: 1742: 1737: 1730: 1727: 1722: 1720:0-471-82668-5 1716: 1712: 1711: 1703: 1700: 1688: 1681: 1679: 1675: 1669: 1665: 1662: 1660: 1657: 1656: 1652: 1650: 1648: 1629: 1626: 1623: 1617: 1614: 1605: 1588: 1565: 1541: 1538: 1534: 1531: 1527: 1523: 1518: 1513: 1510: 1506: 1500: 1496: 1492: 1489: 1486: 1483: 1479: 1475: 1472: 1469: 1465: 1462: 1454: 1451: 1447: 1446:indeterminate 1428: 1422: 1419: 1415: 1410: 1400: 1394: 1390: 1387: 1382: 1378: 1375: 1372: 1369: 1365: 1361: 1358: 1355: 1349: 1342: 1341: 1340: 1338: 1335: 1317: 1314: 1291: 1284: 1281: 1276: 1272: 1266: 1263: 1258: 1253: 1249: 1246: 1218: 1213: 1208: 1205: 1201: 1197: 1194: 1189: 1185: 1182: 1179: 1176: 1172: 1168: 1165: 1162: 1158: 1155: 1147: 1131: 1125: 1122: 1119: 1113: 1107: 1104: 1098: 1095: 1092: 1089: 1083: 1080: 1077: 1071: 1065: 1062: 1055: 1036: 1030: 1027: 1024: 1018: 1012: 1009: 1003: 1000: 997: 994: 988: 985: 979: 976: 969: 953: 949: 945: 939: 936: 930: 927: 924: 921: 918: 915: 909: 906: 899: 889: 885: 881: 878: 875: 872: 869: 863: 860: 853: 843: 840: 837: 831: 828: 816: 815: 811: 809: 807: 801: 787: 784: 781: 774: 755: 751: 748: 744: 740: 737: 733: 727: 721: 718: 715: 709: 703: 700: 697: 691: 688: 685: 679: 676: 669: 668: 667: 653: 650: 647: 619: 613: 607: 604: 601: 597: 594: 586: 567: 561: 555: 552: 549: 545: 542: 534: 533: 532: 517: 514: 492: 489: 478: 470: 468: 465: 463: 444: 441: 438: 423: 418: 414: 410: 406: 402: 399:—notably the 398: 394: 393:Optimization. 391: 375: 369: 359: 355: 350: 341: 337: 333: 330: 321: 317: 313: 310: 305: 301: 280: 277: 272: 262: 258: 253: 244: 240: 236: 233: 224: 216: 212: 208: 205: 202: 198: 194: 191: 187: 183: 180: 179: 178: 172: 170: 168: 164: 160: 156: 152: 148: 144: 139: 136: 131: 129: 128:unit interval 111: 108: 105: 79: 76: 70: 60: 56: 52: 48: 44: 40: 33: 19: 1739: 1729: 1709: 1702: 1690:. Retrieved 1649:) directly. 1606: 1557: 1238: 802: 770: 639: 480: 477:Log semiring 466: 427: 417:maximization 392: 206: 192: 181: 176: 132: 49:is simply a 46: 36: 531:for short: 293:instead of 207:Simplicity. 155:likelihoods 135:independent 55:probability 1761:Logarithms 1755:Categories 1670:References 403:—are only 173:Motivation 1618:⁡ 1592:∞ 1589:− 1569:∞ 1566:− 1507:− 1504:∞ 1501:− 1493:⁡ 1476:⁡ 1432:∞ 1426:∞ 1423:− 1404:∞ 1401:− 1395:− 1379:⁡ 1362:⁡ 1353:∞ 1350:− 1250:⁡ 1202:− 1186:⁡ 1169:⁡ 1126:⁡ 1120:− 1108:⁡ 1099:⁡ 1084:⁡ 1066:⁡ 1031:⁡ 1025:− 1013:⁡ 1004:⁡ 989:⋅ 980:⁡ 940:⁡ 931:⁡ 925:⋅ 910:⁡ 879:⋅ 864:⁡ 832:⁡ 722:⁡ 704:⁡ 689:⋅ 680:⁡ 651:⋅ 620:∈ 608:⁡ 568:∈ 556:⁡ 409:concavity 356:σ 334:− 322:− 314:⁡ 259:σ 237:− 225:− 193:Accuracy. 186:expensive 71:− 51:logarithm 1653:See also 1535:′ 1514:′ 1466:′ 1391:′ 1318:′ 1285:′ 1267:′ 1209:′ 1198:′ 1159:′ 752:′ 741:′ 598:′ 546:′ 518:′ 493:′ 462:inverted 32:log odds 1692:20 July 411:of the 1717:  407:, and 182:Speed. 138:events 1647:log1p 209:Many 53:of a 1715:ISBN 1694:2023 771:The 506:and 45:, a 41:and 1615:log 1490:exp 1473:log 1450:NaN 1376:exp 1359:log 1247:log 1183:exp 1166:log 1123:log 1105:log 1096:exp 1081:log 1063:log 1028:log 1010:log 1001:exp 977:log 937:log 928:exp 907:log 861:log 829:log 719:log 701:log 677:log 605:log 553:log 464:. 311:exp 217:is 74:inf 37:In 1757:: 1738:. 1677:^ 1604:. 808:. 169:. 130:. 1744:. 1723:. 1696:. 1645:( 1633:) 1630:x 1627:+ 1624:1 1621:( 1542:0 1539:+ 1532:x 1528:= 1524:) 1519:) 1511:x 1497:( 1487:+ 1484:1 1480:( 1470:+ 1463:x 1452:. 1429:+ 1420:= 1416:) 1411:) 1407:) 1398:( 1388:y 1383:( 1373:+ 1370:1 1366:( 1356:+ 1315:x 1292:) 1282:y 1277:e 1273:+ 1264:x 1259:e 1254:( 1219:) 1214:) 1206:x 1195:y 1190:( 1180:+ 1177:1 1173:( 1163:+ 1156:x 1148:= 1141:) 1138:) 1135:) 1132:x 1129:( 1117:) 1114:y 1111:( 1102:( 1093:+ 1090:1 1087:( 1078:+ 1075:) 1072:x 1069:( 1056:= 1049:) 1046:) 1043:) 1040:) 1037:x 1034:( 1022:) 1019:y 1016:( 1007:( 998:+ 995:1 992:( 986:x 983:( 970:= 963:) 960:) 957:) 954:x 950:/ 946:y 943:( 934:( 922:x 919:+ 916:x 913:( 900:= 893:) 890:x 886:/ 882:y 876:x 873:+ 870:x 867:( 854:= 847:) 844:y 841:+ 838:x 835:( 788:y 785:+ 782:x 756:. 749:y 745:+ 738:x 734:= 731:) 728:y 725:( 716:+ 713:) 710:x 707:( 698:= 695:) 692:y 686:x 683:( 654:y 648:x 624:R 617:) 614:y 611:( 602:= 595:y 572:R 565:) 562:x 559:( 550:= 543:x 515:y 490:x 448:) 445:1 442:, 439:0 436:( 376:) 370:2 366:) 360:m 351:/ 347:) 342:x 338:m 331:x 328:( 325:( 318:( 306:2 302:C 281:C 278:+ 273:2 269:) 263:m 254:/ 250:) 245:x 241:m 234:x 231:( 228:( 203:. 115:] 112:1 109:, 106:0 103:[ 83:] 80:0 77:, 68:( 34:. 20:)

Index

Log-probability
log odds
probability theory
computer science
logarithm
probability
logarithmic scale
unit interval
independent
events
information theory
expected value
information entropy
likelihoods
log-likelihood
statistical model
natural language processing
expensive
numerical stability
approximate real numbers
probability distributions
probability density function
probability distributions
exponential family
logarithmically concave
concavity
objective function
maximization
inverted
Log semiring

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.