Knowledge (XXG)

Marginal likelihood

Source đź“ť

1607: 268: 36: 1093: 1554: 733: 1335: 966: 1368: 346:
Due to the integration over the parameter space, the marginal likelihood does not directly depend upon the parameters. If the focus is not on model comparison, the marginal likelihood is simply the normalizing constant that ensures that the
639: 570: 428: 1098:
Unfortunately, marginal likelihoods are generally difficult to compute. Exact solutions are known for a small class of distributions, particularly when the marginalized-out parameter is the
1244: 1214:
is the identity of the model itself. In this case, the marginalized likelihood is the probability of the data given the model type, not assuming any particular model parameters. Writing
812: 607: 854: 775: 478: 1088:{\displaystyle {\mathcal {L}}(\psi ;\mathbf {X} )=p(\mathbf {X} \mid \psi )=\int _{\lambda }p(\mathbf {X} \mid \lambda ,\psi )\,p(\lambda \mid \psi )\ \operatorname {d} \!\lambda } 1549:{\displaystyle {\frac {p(M_{1}\mid \mathbf {X} )}{p(M_{2}\mid \mathbf {X} )}}={\frac {p(M_{1})}{p(M_{2})}}\,{\frac {p(\mathbf {X} \mid M_{1})}{p(\mathbf {X} \mid M_{2})}}} 814:
is the likelihood. The marginal likelihood quantifies the agreement between data and prior in a geometric sense made precise in de Carvalho et al. (2019). In classical (
1636: 958: 918: 894: 1232: 1192: 627: 522: 502: 938: 874: 1212: 1151: 298: 89: 352: 171: 1752: 335:
for all possible values of the parameters; it can be understood as the probability of the model itself and is therefore often referred to as
728:{\displaystyle p(\mathbf {X} \mid \alpha )=\int _{\theta }p(\mathbf {X} \mid \theta )\,p(\theta \mid \alpha )\ \operatorname {d} \!\theta } 364: 1123: 1658: 291: 254: 181: 84: 207: 1595: 145: 1154: 531: 370: 1781: 284: 176: 114: 1619: 1330:{\displaystyle p(\mathbf {X} \mid M)=\int p(\mathbf {X} \mid \theta ,M)\,p(\theta \mid M)\,\operatorname {d} \!\theta } 1629: 1623: 1615: 166: 135: 1171: 780: 575: 228: 109: 1640: 481: 249: 161: 1711:
Charles S. Bos. "A comparison of marginal likelihood computation methods". In W. Härdle and B. Ronz, editors,
821: 1580: 745: 433: 815: 140: 1115: 1103: 630: 348: 332: 43: 267: 1729:
de Carvalho, Miguel; Page, Garritt; Barney, Bradley (2019). "On the geometry of Bayesian inference".
1590: 1585: 1107: 223: 104: 74: 818:) statistics, the concept of marginal likelihood occurs instead in the context of a joint parameter 739: 328: 316: 47: 27: 1111: 897: 272: 197: 69: 1736: 99: 1133:
It is also possible to apply the above considerations to a single random variable (data point)
1765: 1748: 1718: 943: 903: 879: 202: 79: 51: 1217: 1177: 612: 507: 487: 1690: 94: 1344:
is normally used. This quantity is important because the posterior odds ratio for a model
923: 859: 1099: 525: 324: 130: 1197: 1153:, rather than a set of observations. In a Bayesian context, this is equivalent to the 1136: 1119: 1775: 1761: 1568: 1359: 1127: 244: 920:, it is often desirable to consider the likelihood function only in terms of 1762:
The on-line textbook: Information Theory, Inference, and Learning Algorithms
1694: 1194:
are parameters for a particular type of model, and the remaining variable
320: 35: 1722: 1564: 1234:
for the model parameters, the marginal likelihood for the model
1600: 1114:, or a method specialized to statistical problems such as the 1102:
of the distribution of the data. In other cases, some kind of
572:
the marginal likelihood in general asks what the probability
972: 1685:Šmídl, Václav; Quinn, Anthony (2006). "Bayesian Theory". 1743:
Lambert, Ben (2018). "The devil is in the denominator".
1713:
COMPSTAT 2002: Proceedings in Computational Statistics
1371: 1358:
involves a ratio of marginal likelihoods, called the
1247: 1220: 1200: 1180: 1139: 969: 946: 926: 906: 882: 862: 824: 783: 748: 642: 615: 578: 534: 510: 490: 436: 373: 565:{\displaystyle \theta \sim p(\theta \mid \alpha ),} 423:{\displaystyle \mathbf {X} =(x_{1},\ldots ,x_{n}),} 1548: 1329: 1226: 1206: 1186: 1145: 1106:method is needed, either a general method such as 1087: 952: 932: 912: 900:. If there exists a probability distribution for 888: 868: 848: 806: 769: 738:The above definition is phrased in the context of 727: 621: 601: 564: 516: 496: 472: 422: 331:, it represents the probability of generating the 1687:The Variational Bayes Method in Signal Processing 1323: 1081: 721: 1628:but its sources remain unclear because it lacks 351:is a proper probability. It is related to the 292: 8: 807:{\displaystyle p(\mathbf {X} \mid \theta )} 602:{\displaystyle p(\mathbf {X} \mid \alpha )} 353:partition function in statistical mechanics 299: 285: 18: 1659:Learn how and when to remove this message 1534: 1522: 1505: 1493: 1484: 1483: 1471: 1450: 1437: 1423: 1414: 1394: 1385: 1372: 1370: 1319: 1300: 1280: 1254: 1246: 1219: 1199: 1179: 1138: 1056: 1036: 1024: 1003: 986: 971: 970: 968: 945: 925: 905: 881: 876:is the actual parameter of interest, and 861: 823: 790: 782: 747: 696: 682: 670: 649: 641: 614: 585: 577: 533: 509: 489: 459: 441: 435: 408: 389: 374: 372: 1745:A Student's Guide to Bayesian Statistics 849:{\displaystyle \theta =(\psi ,\lambda )} 172:Integrated nested Laplace approximations 1677: 236: 215: 189: 153: 122: 61: 26: 770:{\displaystyle p(\theta \mid \alpha )} 473:{\displaystyle x_{i}\sim p(x|\theta )} 1735:(Available as a preprint on the web: 1559:which can be stated schematically as 7: 1340:It is in this context that the term 365:independent identically distributed 1320: 1078: 718: 528:described by a distribution, i.e. 14: 1715:, pp. 111–117. 2002. 1605: 1523: 1494: 1424: 1395: 1281: 1255: 1037: 1004: 987: 791: 683: 650: 586: 375: 266: 182:Approximate Bayesian computation 34: 208:Maximum a posteriori estimation 1596:Bayesian information criterion 1540: 1519: 1511: 1490: 1477: 1464: 1456: 1443: 1428: 1407: 1399: 1378: 1316: 1304: 1297: 1277: 1265: 1251: 1072: 1060: 1053: 1033: 1014: 1000: 991: 977: 843: 831: 801: 787: 764: 752: 712: 700: 693: 679: 660: 646: 596: 582: 556: 544: 467: 460: 453: 414: 382: 16:In Bayesian probability theory 1: 1174:, the marginalized variables 1155:prior predictive distribution 1717:(Available as a preprint on 1689:. Springer. pp. 13–23. 777:is called prior density and 115:Principle of maximum entropy 85:Bernstein–von Mises theorem 1798: 1747:. Sage. pp. 109–120. 1172:Bayesian model comparison 1166:Bayesian model comparison 110:Principle of indifference 1614:This article includes a 953:{\displaystyle \lambda } 913:{\displaystyle \lambda } 889:{\displaystyle \lambda } 482:probability distribution 162:Markov chain Monte Carlo 1695:10.1007/3-540-28820-1_2 1643:more precise citations. 1581:Empirical Bayes methods 1227:{\displaystyle \theta } 1187:{\displaystyle \theta } 940:, by marginalizing out 622:{\displaystyle \theta } 517:{\displaystyle \theta } 497:{\displaystyle \theta } 167:Laplace's approximation 154:Posterior approximation 1550: 1351:against another model 1331: 1228: 1208: 1188: 1147: 1089: 954: 934: 914: 890: 870: 850: 808: 771: 729: 623: 603: 566: 518: 498: 474: 424: 273:Mathematics portal 216:Evidence approximation 1733:. 14 (4): 1013‒1036. 1551: 1332: 1229: 1209: 1189: 1148: 1116:Laplace approximation 1104:numerical integration 1090: 955: 935: 933:{\displaystyle \psi } 915: 896:is a non-interesting 891: 871: 869:{\displaystyle \psi } 851: 809: 772: 730: 624: 604: 567: 519: 499: 475: 425: 177:Variational inference 1591:Marginal probability 1369: 1245: 1218: 1198: 1178: 1137: 1108:Gaussian integration 967: 944: 924: 904: 880: 860: 822: 781: 746: 640: 613: 576: 532: 508: 488: 434: 371: 255:Posterior predictive 224:Evidence lower bound 105:Likelihood principle 75:Bayesian probability 1782:Bayesian statistics 740:Bayesian statistics 329:Bayesian statistics 317:likelihood function 313:marginal likelihood 28:Bayesian statistics 22:Part of a series on 1616:list of references 1546: 1327: 1224: 1204: 1184: 1143: 1112:Monte Carlo method 1085: 950: 930: 910: 898:nuisance parameter 886: 866: 846: 804: 767: 725: 633:(integrated out): 619: 599: 562: 514: 494: 480:according to some 470: 420: 198:Bayesian estimator 146:Hierarchical model 70:Bayesian inference 1766:David J.C. MacKay 1754:978-1-4739-1636-4 1731:Bayesian Analysis 1669: 1668: 1661: 1586:Lindley's paradox 1544: 1481: 1432: 1207:{\displaystyle M} 1157:of a data point. 1146:{\displaystyle x} 1126:sampling, or the 1077: 717: 484:parameterized by 309: 308: 203:Credible interval 136:Linear regression 1789: 1758: 1699: 1698: 1682: 1664: 1657: 1653: 1650: 1644: 1639:this article by 1630:inline citations 1609: 1608: 1601: 1555: 1553: 1552: 1547: 1545: 1543: 1539: 1538: 1526: 1514: 1510: 1509: 1497: 1485: 1482: 1480: 1476: 1475: 1459: 1455: 1454: 1438: 1433: 1431: 1427: 1419: 1418: 1402: 1398: 1390: 1389: 1373: 1336: 1334: 1333: 1328: 1284: 1258: 1233: 1231: 1230: 1225: 1213: 1211: 1210: 1205: 1193: 1191: 1190: 1185: 1152: 1150: 1149: 1144: 1094: 1092: 1091: 1086: 1075: 1040: 1029: 1028: 1007: 990: 976: 975: 959: 957: 956: 951: 939: 937: 936: 931: 919: 917: 916: 911: 895: 893: 892: 887: 875: 873: 872: 867: 855: 853: 852: 847: 813: 811: 810: 805: 794: 776: 774: 773: 768: 734: 732: 731: 726: 715: 686: 675: 674: 653: 631:marginalized out 628: 626: 625: 620: 608: 606: 605: 600: 589: 571: 569: 568: 563: 523: 521: 520: 515: 503: 501: 500: 495: 479: 477: 476: 471: 463: 446: 445: 429: 427: 426: 421: 413: 412: 394: 393: 378: 301: 294: 287: 271: 270: 237:Model evaluation 38: 19: 1797: 1796: 1792: 1791: 1790: 1788: 1787: 1786: 1772: 1771: 1755: 1742: 1708: 1706:Further reading 1703: 1702: 1684: 1683: 1679: 1674: 1665: 1654: 1648: 1645: 1634: 1620:related reading 1610: 1606: 1577: 1567:= prior odds Ă— 1530: 1515: 1501: 1486: 1467: 1460: 1446: 1439: 1410: 1403: 1381: 1374: 1367: 1366: 1357: 1350: 1243: 1242: 1216: 1215: 1196: 1195: 1176: 1175: 1168: 1163: 1135: 1134: 1100:conjugate prior 1020: 965: 964: 942: 941: 922: 921: 902: 901: 878: 877: 858: 857: 820: 819: 779: 778: 744: 743: 666: 638: 637: 611: 610: 574: 573: 530: 529: 526:random variable 506: 505: 486: 485: 437: 432: 431: 404: 385: 369: 368: 363:Given a set of 361: 333:observed sample 325:parameter space 305: 265: 250:Model averaging 229:Nested sampling 141:Empirical Bayes 131:Conjugate prior 100:Cromwell's rule 17: 12: 11: 5: 1795: 1793: 1785: 1784: 1774: 1773: 1770: 1769: 1759: 1753: 1740: 1727: 1707: 1704: 1701: 1700: 1676: 1675: 1673: 1670: 1667: 1666: 1624:external links 1613: 1611: 1604: 1599: 1598: 1593: 1588: 1583: 1576: 1573: 1572: 1571: 1557: 1556: 1542: 1537: 1533: 1529: 1525: 1521: 1518: 1513: 1508: 1504: 1500: 1496: 1492: 1489: 1479: 1474: 1470: 1466: 1463: 1458: 1453: 1449: 1445: 1442: 1436: 1430: 1426: 1422: 1417: 1413: 1409: 1406: 1401: 1397: 1393: 1388: 1384: 1380: 1377: 1355: 1348: 1342:model evidence 1338: 1337: 1326: 1322: 1318: 1315: 1312: 1309: 1306: 1303: 1299: 1296: 1293: 1290: 1287: 1283: 1279: 1276: 1273: 1270: 1267: 1264: 1261: 1257: 1253: 1250: 1223: 1203: 1183: 1167: 1164: 1162: 1159: 1142: 1096: 1095: 1084: 1080: 1074: 1071: 1068: 1065: 1062: 1059: 1055: 1052: 1049: 1046: 1043: 1039: 1035: 1032: 1027: 1023: 1019: 1016: 1013: 1010: 1006: 1002: 999: 996: 993: 989: 985: 982: 979: 974: 949: 929: 909: 885: 865: 845: 842: 839: 836: 833: 830: 827: 803: 800: 797: 793: 789: 786: 766: 763: 760: 757: 754: 751: 742:in which case 736: 735: 724: 720: 714: 711: 708: 705: 702: 699: 695: 692: 689: 685: 681: 678: 673: 669: 665: 662: 659: 656: 652: 648: 645: 618: 598: 595: 592: 588: 584: 581: 561: 558: 555: 552: 549: 546: 543: 540: 537: 513: 493: 469: 466: 462: 458: 455: 452: 449: 444: 440: 419: 416: 411: 407: 403: 400: 397: 392: 388: 384: 381: 377: 360: 357: 337:model evidence 319:that has been 307: 306: 304: 303: 296: 289: 281: 278: 277: 276: 275: 260: 259: 258: 257: 252: 247: 239: 238: 234: 233: 232: 231: 226: 218: 217: 213: 212: 211: 210: 205: 200: 192: 191: 187: 186: 185: 184: 179: 174: 169: 164: 156: 155: 151: 150: 149: 148: 143: 138: 133: 125: 124: 123:Model building 120: 119: 118: 117: 112: 107: 102: 97: 92: 87: 82: 80:Bayes' theorem 77: 72: 64: 63: 59: 58: 40: 39: 31: 30: 24: 23: 15: 13: 10: 9: 6: 4: 3: 2: 1794: 1783: 1780: 1779: 1777: 1767: 1763: 1760: 1756: 1750: 1746: 1741: 1739: 1737: 1732: 1728: 1726: 1724: 1720: 1714: 1710: 1709: 1705: 1696: 1692: 1688: 1681: 1678: 1671: 1663: 1660: 1652: 1642: 1638: 1632: 1631: 1625: 1621: 1617: 1612: 1603: 1602: 1597: 1594: 1592: 1589: 1587: 1584: 1582: 1579: 1578: 1574: 1570: 1566: 1562: 1561: 1560: 1535: 1531: 1527: 1516: 1506: 1502: 1498: 1487: 1472: 1468: 1461: 1451: 1447: 1440: 1434: 1420: 1415: 1411: 1404: 1391: 1386: 1382: 1375: 1365: 1364: 1363: 1361: 1354: 1347: 1343: 1324: 1313: 1310: 1307: 1301: 1294: 1291: 1288: 1285: 1274: 1271: 1268: 1262: 1259: 1248: 1241: 1240: 1239: 1237: 1221: 1201: 1181: 1173: 1165: 1160: 1158: 1156: 1140: 1131: 1129: 1125: 1121: 1117: 1113: 1109: 1105: 1101: 1082: 1069: 1066: 1063: 1057: 1050: 1047: 1044: 1041: 1030: 1025: 1021: 1017: 1011: 1008: 997: 994: 983: 980: 963: 962: 961: 947: 927: 907: 899: 883: 863: 840: 837: 834: 828: 825: 817: 798: 795: 784: 761: 758: 755: 749: 741: 722: 709: 706: 703: 697: 690: 687: 676: 671: 667: 663: 657: 654: 643: 636: 635: 634: 632: 616: 593: 590: 579: 559: 553: 550: 547: 541: 538: 535: 527: 511: 491: 483: 464: 456: 450: 447: 442: 438: 417: 409: 405: 401: 398: 395: 390: 386: 379: 366: 358: 356: 354: 350: 344: 342: 338: 334: 330: 326: 322: 318: 314: 302: 297: 295: 290: 288: 283: 282: 280: 279: 274: 269: 264: 263: 262: 261: 256: 253: 251: 248: 246: 243: 242: 241: 240: 235: 230: 227: 225: 222: 221: 220: 219: 214: 209: 206: 204: 201: 199: 196: 195: 194: 193: 188: 183: 180: 178: 175: 173: 170: 168: 165: 163: 160: 159: 158: 157: 152: 147: 144: 142: 139: 137: 134: 132: 129: 128: 127: 126: 121: 116: 113: 111: 108: 106: 103: 101: 98: 96: 95:Cox's theorem 93: 91: 88: 86: 83: 81: 78: 76: 73: 71: 68: 67: 66: 65: 60: 57: 53: 49: 45: 42: 41: 37: 33: 32: 29: 25: 21: 20: 1744: 1734: 1730: 1716: 1712: 1686: 1680: 1655: 1646: 1635:Please help 1627: 1569:Bayes factor 1558: 1360:Bayes factor 1352: 1345: 1341: 1339: 1235: 1169: 1161:Applications 1132: 1128:EM algorithm 1097: 737: 524:itself is a 367:data points 362: 345: 340: 336: 312: 310: 245:Bayes factor 55: 1641:introducing 816:frequentist 1672:References 1563:posterior 1124:Metropolis 609:is, where 339:or simply 321:integrated 190:Estimators 62:Background 48:Likelihood 1649:July 2010 1528:∣ 1499:∣ 1421:∣ 1392:∣ 1325:θ 1311:∣ 1308:θ 1289:θ 1286:∣ 1272:∫ 1260:∣ 1222:θ 1182:θ 1083:λ 1070:ψ 1067:∣ 1064:λ 1051:ψ 1045:λ 1042:∣ 1026:λ 1022:∫ 1012:ψ 1009:∣ 981:ψ 948:λ 928:ψ 908:λ 884:λ 864:ψ 841:λ 835:ψ 826:θ 799:θ 796:∣ 762:α 759:∣ 756:θ 723:θ 710:α 707:∣ 704:θ 691:θ 688:∣ 672:θ 668:∫ 658:α 655:∣ 629:has been 617:θ 594:α 591:∣ 554:α 551:∣ 548:θ 539:∼ 536:θ 512:θ 492:θ 465:θ 448:∼ 399:… 349:posterior 323:over the 90:Coherence 44:Posterior 1776:Category 1575:See also 856:, where 504:, where 341:evidence 56:Evidence 1637:improve 359:Concept 1751:  1723:332860 1721:  1076:  716:  430:where 1764:, by 1622:, or 1120:Gibbs 1110:or a 327:. In 315:is a 52:Prior 1749:ISBN 1719:SSRN 1565:odds 1691:doi 1238:is 1170:In 343:. 1778:: 1626:, 1618:, 1362:: 1130:. 1118:, 960:: 355:. 311:A 54:Ă· 50:Ă— 46:= 1768:. 1757:. 1738:) 1725:) 1697:. 1693:: 1662:) 1656:( 1651:) 1647:( 1633:. 1541:) 1536:2 1532:M 1524:X 1520:( 1517:p 1512:) 1507:1 1503:M 1495:X 1491:( 1488:p 1478:) 1473:2 1469:M 1465:( 1462:p 1457:) 1452:1 1448:M 1444:( 1441:p 1435:= 1429:) 1425:X 1416:2 1412:M 1408:( 1405:p 1400:) 1396:X 1387:1 1383:M 1379:( 1376:p 1356:2 1353:M 1349:1 1346:M 1321:d 1317:) 1314:M 1305:( 1302:p 1298:) 1295:M 1292:, 1282:X 1278:( 1275:p 1269:= 1266:) 1263:M 1256:X 1252:( 1249:p 1236:M 1202:M 1141:x 1122:/ 1079:d 1073:) 1061:( 1058:p 1054:) 1048:, 1038:X 1034:( 1031:p 1018:= 1015:) 1005:X 1001:( 998:p 995:= 992:) 988:X 984:; 978:( 973:L 844:) 838:, 832:( 829:= 802:) 792:X 788:( 785:p 765:) 753:( 750:p 719:d 713:) 701:( 698:p 694:) 684:X 680:( 677:p 664:= 661:) 651:X 647:( 644:p 597:) 587:X 583:( 580:p 560:, 557:) 545:( 542:p 468:) 461:| 457:x 454:( 451:p 443:i 439:x 418:, 415:) 410:n 406:x 402:, 396:, 391:1 387:x 383:( 380:= 376:X 300:e 293:t 286:v

Index

Bayesian statistics

Posterior
Likelihood
Prior
Evidence
Bayesian inference
Bayesian probability
Bayes' theorem
Bernstein–von Mises theorem
Coherence
Cox's theorem
Cromwell's rule
Likelihood principle
Principle of indifference
Principle of maximum entropy
Conjugate prior
Linear regression
Empirical Bayes
Hierarchical model
Markov chain Monte Carlo
Laplace's approximation
Integrated nested Laplace approximations
Variational inference
Approximate Bayesian computation
Bayesian estimator
Credible interval
Maximum a posteriori estimation
Evidence lower bound
Nested sampling

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

↑