1607:
268:
36:
1093:
1554:
733:
1335:
966:
1368:
346:
Due to the integration over the parameter space, the marginal likelihood does not directly depend upon the parameters. If the focus is not on model comparison, the marginal likelihood is simply the normalizing constant that ensures that the
639:
570:
428:
1098:
Unfortunately, marginal likelihoods are generally difficult to compute. Exact solutions are known for a small class of distributions, particularly when the marginalized-out parameter is the
1244:
1214:
is the identity of the model itself. In this case, the marginalized likelihood is the probability of the data given the model type, not assuming any particular model parameters. Writing
812:
607:
854:
775:
478:
1088:{\displaystyle {\mathcal {L}}(\psi ;\mathbf {X} )=p(\mathbf {X} \mid \psi )=\int _{\lambda }p(\mathbf {X} \mid \lambda ,\psi )\,p(\lambda \mid \psi )\ \operatorname {d} \!\lambda }
1549:{\displaystyle {\frac {p(M_{1}\mid \mathbf {X} )}{p(M_{2}\mid \mathbf {X} )}}={\frac {p(M_{1})}{p(M_{2})}}\,{\frac {p(\mathbf {X} \mid M_{1})}{p(\mathbf {X} \mid M_{2})}}}
814:
is the likelihood. The marginal likelihood quantifies the agreement between data and prior in a geometric sense made precise in de
Carvalho et al. (2019). In classical (
1636:
958:
918:
894:
1232:
1192:
627:
522:
502:
938:
874:
1212:
1151:
298:
89:
352:
171:
1752:
335:
for all possible values of the parameters; it can be understood as the probability of the model itself and is therefore often referred to as
728:{\displaystyle p(\mathbf {X} \mid \alpha )=\int _{\theta }p(\mathbf {X} \mid \theta )\,p(\theta \mid \alpha )\ \operatorname {d} \!\theta }
364:
1123:
1658:
291:
254:
181:
84:
207:
1595:
145:
1154:
531:
370:
1781:
284:
176:
114:
1619:
1330:{\displaystyle p(\mathbf {X} \mid M)=\int p(\mathbf {X} \mid \theta ,M)\,p(\theta \mid M)\,\operatorname {d} \!\theta }
1629:
1623:
1615:
166:
135:
1171:
780:
575:
228:
109:
1640:
481:
249:
161:
1711:
Charles S. Bos. "A comparison of marginal likelihood computation methods". In W. Härdle and B. Ronz, editors,
821:
1580:
745:
433:
815:
140:
1115:
1103:
630:
348:
332:
43:
267:
1729:
de
Carvalho, Miguel; Page, Garritt; Barney, Bradley (2019). "On the geometry of Bayesian inference".
1590:
1585:
1107:
223:
104:
74:
818:) statistics, the concept of marginal likelihood occurs instead in the context of a joint parameter
739:
328:
316:
47:
27:
1111:
897:
272:
197:
69:
1736:
99:
1133:
It is also possible to apply the above considerations to a single random variable (data point)
1765:
1748:
1718:
943:
903:
879:
202:
79:
51:
1217:
1177:
612:
507:
487:
1690:
94:
1344:
is normally used. This quantity is important because the posterior odds ratio for a model
923:
859:
1099:
525:
324:
130:
1197:
1153:, rather than a set of observations. In a Bayesian context, this is equivalent to the
1136:
1119:
1775:
1761:
1568:
1359:
1127:
244:
920:, it is often desirable to consider the likelihood function only in terms of
1762:
The on-line textbook: Information Theory, Inference, and
Learning Algorithms
1694:
1194:
are parameters for a particular type of model, and the remaining variable
320:
35:
1722:
1564:
1234:
for the model parameters, the marginal likelihood for the model
1600:
1114:, or a method specialized to statistical problems such as the
1102:
of the distribution of the data. In other cases, some kind of
572:
the marginal likelihood in general asks what the probability
972:
1685:Ĺ mĂdl, Václav; Quinn, Anthony (2006). "Bayesian Theory".
1743:
Lambert, Ben (2018). "The devil is in the denominator".
1713:
1371:
1358:
involves a ratio of marginal likelihoods, called the
1247:
1220:
1200:
1180:
1139:
969:
946:
926:
906:
882:
862:
824:
783:
748:
642:
615:
578:
534:
510:
490:
436:
373:
565:{\displaystyle \theta \sim p(\theta \mid \alpha ),}
423:{\displaystyle \mathbf {X} =(x_{1},\ldots ,x_{n}),}
1548:
1329:
1226:
1206:
1186:
1145:
1106:method is needed, either a general method such as
1087:
952:
932:
912:
900:. If there exists a probability distribution for
888:
868:
848:
806:
769:
738:The above definition is phrased in the context of
727:
621:
601:
564:
516:
496:
472:
422:
331:, it represents the probability of generating the
1687:The Variational Bayes Method in Signal Processing
1323:
1081:
721:
1628:but its sources remain unclear because it lacks
351:is a proper probability. It is related to the
292:
8:
807:{\displaystyle p(\mathbf {X} \mid \theta )}
602:{\displaystyle p(\mathbf {X} \mid \alpha )}
353:partition function in statistical mechanics
299:
285:
18:
1659:Learn how and when to remove this message
1534:
1522:
1505:
1493:
1484:
1483:
1471:
1450:
1437:
1423:
1414:
1394:
1385:
1372:
1370:
1319:
1300:
1280:
1254:
1246:
1219:
1199:
1179:
1138:
1056:
1036:
1024:
1003:
986:
971:
970:
968:
945:
925:
905:
881:
876:is the actual parameter of interest, and
861:
823:
790:
782:
747:
696:
682:
670:
649:
641:
614:
585:
577:
533:
509:
489:
459:
441:
435:
408:
389:
374:
372:
1745:A Student's Guide to Bayesian Statistics
849:{\displaystyle \theta =(\psi ,\lambda )}
172:Integrated nested Laplace approximations
1677:
236:
215:
189:
153:
122:
61:
26:
770:{\displaystyle p(\theta \mid \alpha )}
473:{\displaystyle x_{i}\sim p(x|\theta )}
1735:(Available as a preprint on the web:
1559:which can be stated schematically as
7:
1340:It is in this context that the term
365:independent identically distributed
1320:
1078:
718:
528:described by a distribution, i.e.
14:
1715:, pp. 111–117. 2002.
1605:
1523:
1494:
1424:
1395:
1281:
1255:
1037:
1004:
987:
791:
683:
650:
586:
375:
266:
182:Approximate Bayesian computation
34:
208:Maximum a posteriori estimation
1596:Bayesian information criterion
1540:
1519:
1511:
1490:
1477:
1464:
1456:
1443:
1428:
1407:
1399:
1378:
1316:
1304:
1297:
1277:
1265:
1251:
1072:
1060:
1053:
1033:
1014:
1000:
991:
977:
843:
831:
801:
787:
764:
752:
712:
700:
693:
679:
660:
646:
596:
582:
556:
544:
467:
460:
453:
414:
382:
16:In Bayesian probability theory
1:
1174:, the marginalized variables
1155:prior predictive distribution
1717:(Available as a preprint on
1689:. Springer. pp. 13–23.
777:is called prior density and
115:Principle of maximum entropy
85:Bernstein–von Mises theorem
1798:
1747:. Sage. pp. 109–120.
1172:Bayesian model comparison
1166:Bayesian model comparison
110:Principle of indifference
1614:This article includes a
953:{\displaystyle \lambda }
913:{\displaystyle \lambda }
889:{\displaystyle \lambda }
482:probability distribution
162:Markov chain Monte Carlo
1695:10.1007/3-540-28820-1_2
1643:more precise citations.
1581:Empirical Bayes methods
1227:{\displaystyle \theta }
1187:{\displaystyle \theta }
940:, by marginalizing out
622:{\displaystyle \theta }
517:{\displaystyle \theta }
497:{\displaystyle \theta }
167:Laplace's approximation
154:Posterior approximation
1550:
1351:against another model
1331:
1228:
1208:
1188:
1147:
1089:
954:
934:
914:
890:
870:
850:
808:
771:
729:
623:
603:
566:
518:
498:
474:
424:
273:Mathematics portal
216:Evidence approximation
1733:. 14 (4): 1013‒1036.
1551:
1332:
1229:
1209:
1189:
1148:
1116:Laplace approximation
1104:numerical integration
1090:
955:
935:
933:{\displaystyle \psi }
915:
896:is a non-interesting
891:
871:
869:{\displaystyle \psi }
851:
809:
772:
730:
624:
604:
567:
519:
499:
475:
425:
177:Variational inference
1591:Marginal probability
1369:
1245:
1218:
1198:
1178:
1137:
1108:Gaussian integration
967:
944:
924:
904:
880:
860:
822:
781:
746:
640:
613:
576:
532:
508:
488:
434:
371:
255:Posterior predictive
224:Evidence lower bound
105:Likelihood principle
75:Bayesian probability
1782:Bayesian statistics
740:Bayesian statistics
329:Bayesian statistics
317:likelihood function
313:marginal likelihood
28:Bayesian statistics
22:Part of a series on
1616:list of references
1546:
1327:
1224:
1204:
1184:
1143:
1112:Monte Carlo method
1085:
950:
930:
910:
898:nuisance parameter
886:
866:
846:
804:
767:
725:
633:(integrated out):
619:
599:
562:
514:
494:
480:according to some
470:
420:
198:Bayesian estimator
146:Hierarchical model
70:Bayesian inference
1766:David J.C. MacKay
1754:978-1-4739-1636-4
1731:Bayesian Analysis
1669:
1668:
1661:
1586:Lindley's paradox
1544:
1481:
1432:
1207:{\displaystyle M}
1157:of a data point.
1146:{\displaystyle x}
1126:sampling, or the
1077:
717:
484:parameterized by
309:
308:
203:Credible interval
136:Linear regression
1789:
1758:
1699:
1698:
1682:
1664:
1657:
1653:
1650:
1644:
1639:this article by
1630:inline citations
1609:
1608:
1601:
1555:
1553:
1552:
1547:
1545:
1543:
1539:
1538:
1526:
1514:
1510:
1509:
1497:
1485:
1482:
1480:
1476:
1475:
1459:
1455:
1454:
1438:
1433:
1431:
1427:
1419:
1418:
1402:
1398:
1390:
1389:
1373:
1336:
1334:
1333:
1328:
1284:
1258:
1233:
1231:
1230:
1225:
1213:
1211:
1210:
1205:
1193:
1191:
1190:
1185:
1152:
1150:
1149:
1144:
1094:
1092:
1091:
1086:
1075:
1040:
1029:
1028:
1007:
990:
976:
975:
959:
957:
956:
951:
939:
937:
936:
931:
919:
917:
916:
911:
895:
893:
892:
887:
875:
873:
872:
867:
855:
853:
852:
847:
813:
811:
810:
805:
794:
776:
774:
773:
768:
734:
732:
731:
726:
715:
686:
675:
674:
653:
631:marginalized out
628:
626:
625:
620:
608:
606:
605:
600:
589:
571:
569:
568:
563:
523:
521:
520:
515:
503:
501:
500:
495:
479:
477:
476:
471:
463:
446:
445:
429:
427:
426:
421:
413:
412:
394:
393:
378:
301:
294:
287:
271:
270:
237:Model evaluation
38:
19:
1797:
1796:
1792:
1791:
1790:
1788:
1787:
1786:
1772:
1771:
1755:
1742:
1708:
1706:Further reading
1703:
1702:
1684:
1683:
1679:
1674:
1665:
1654:
1648:
1645:
1634:
1620:related reading
1610:
1606:
1577:
1567:= prior odds Ă—
1530:
1515:
1501:
1486:
1467:
1460:
1446:
1439:
1410:
1403:
1381:
1374:
1367:
1366:
1357:
1350:
1243:
1242:
1216:
1215:
1196:
1195:
1176:
1175:
1168:
1163:
1135:
1134:
1100:conjugate prior
1020:
965:
964:
942:
941:
922:
921:
902:
901:
878:
877:
858:
857:
820:
819:
779:
778:
744:
743:
666:
638:
637:
611:
610:
574:
573:
530:
529:
526:random variable
506:
505:
486:
485:
437:
432:
431:
404:
385:
369:
368:
363:Given a set of
361:
333:observed sample
325:parameter space
305:
265:
250:Model averaging
229:Nested sampling
141:Empirical Bayes
131:Conjugate prior
100:Cromwell's rule
17:
12:
11:
5:
1795:
1793:
1785:
1784:
1774:
1773:
1770:
1769:
1759:
1753:
1740:
1727:
1707:
1704:
1701:
1700:
1676:
1675:
1673:
1670:
1667:
1666:
1624:external links
1613:
1611:
1604:
1599:
1598:
1593:
1588:
1583:
1576:
1573:
1572:
1571:
1557:
1556:
1542:
1537:
1533:
1529:
1525:
1521:
1518:
1513:
1508:
1504:
1500:
1496:
1492:
1489:
1479:
1474:
1470:
1466:
1463:
1458:
1453:
1449:
1445:
1442:
1436:
1430:
1426:
1422:
1417:
1413:
1409:
1406:
1401:
1397:
1393:
1388:
1384:
1380:
1377:
1355:
1348:
1342:model evidence
1338:
1337:
1326:
1322:
1318:
1315:
1312:
1309:
1306:
1303:
1299:
1296:
1293:
1290:
1287:
1283:
1279:
1276:
1273:
1270:
1267:
1264:
1261:
1257:
1253:
1250:
1223:
1203:
1183:
1167:
1164:
1162:
1159:
1142:
1096:
1095:
1084:
1080:
1074:
1071:
1068:
1065:
1062:
1059:
1055:
1052:
1049:
1046:
1043:
1039:
1035:
1032:
1027:
1023:
1019:
1016:
1013:
1010:
1006:
1002:
999:
996:
993:
989:
985:
982:
979:
974:
949:
929:
909:
885:
865:
845:
842:
839:
836:
833:
830:
827:
803:
800:
797:
793:
789:
786:
766:
763:
760:
757:
754:
751:
742:in which case
736:
735:
724:
720:
714:
711:
708:
705:
702:
699:
695:
692:
689:
685:
681:
678:
673:
669:
665:
662:
659:
656:
652:
648:
645:
618:
598:
595:
592:
588:
584:
581:
561:
558:
555:
552:
549:
546:
543:
540:
537:
513:
493:
469:
466:
462:
458:
455:
452:
449:
444:
440:
419:
416:
411:
407:
403:
400:
397:
392:
388:
384:
381:
377:
360:
357:
337:model evidence
319:that has been
307:
306:
304:
303:
296:
289:
281:
278:
277:
276:
275:
260:
259:
258:
257:
252:
247:
239:
238:
234:
233:
232:
231:
226:
218:
217:
213:
212:
211:
210:
205:
200:
192:
191:
187:
186:
185:
184:
179:
174:
169:
164:
156:
155:
151:
150:
149:
148:
143:
138:
133:
125:
124:
123:Model building
120:
119:
118:
117:
112:
107:
102:
97:
92:
87:
82:
80:Bayes' theorem
77:
72:
64:
63:
59:
58:
40:
39:
31:
30:
24:
23:
15:
13:
10:
9:
6:
4:
3:
2:
1794:
1783:
1780:
1779:
1777:
1767:
1763:
1760:
1756:
1750:
1746:
1741:
1739:
1737:
1732:
1728:
1726:
1724:
1720:
1714:
1710:
1709:
1705:
1696:
1692:
1688:
1681:
1678:
1671:
1663:
1660:
1652:
1642:
1638:
1632:
1631:
1625:
1621:
1617:
1612:
1603:
1602:
1597:
1594:
1592:
1589:
1587:
1584:
1582:
1579:
1578:
1574:
1570:
1566:
1562:
1561:
1560:
1535:
1531:
1527:
1516:
1506:
1502:
1498:
1487:
1472:
1468:
1461:
1451:
1447:
1440:
1434:
1420:
1415:
1411:
1404:
1391:
1386:
1382:
1375:
1365:
1364:
1363:
1361:
1354:
1347:
1343:
1324:
1313:
1310:
1307:
1301:
1294:
1291:
1288:
1285:
1274:
1271:
1268:
1262:
1259:
1248:
1241:
1240:
1239:
1237:
1221:
1201:
1181:
1173:
1165:
1160:
1158:
1156:
1140:
1131:
1129:
1125:
1121:
1117:
1113:
1109:
1105:
1101:
1082:
1069:
1066:
1063:
1057:
1050:
1047:
1044:
1041:
1030:
1025:
1021:
1017:
1011:
1008:
997:
994:
983:
980:
963:
962:
961:
947:
927:
907:
899:
883:
863:
840:
837:
834:
828:
825:
817:
798:
795:
784:
761:
758:
755:
749:
741:
722:
709:
706:
703:
697:
690:
687:
676:
671:
667:
663:
657:
654:
643:
636:
635:
634:
632:
616:
593:
590:
579:
559:
553:
550:
547:
541:
538:
535:
527:
511:
491:
483:
464:
456:
450:
447:
442:
438:
417:
409:
405:
401:
398:
395:
390:
386:
379:
366:
358:
356:
354:
350:
344:
342:
338:
334:
330:
326:
322:
318:
314:
302:
297:
295:
290:
288:
283:
282:
280:
279:
274:
269:
264:
263:
262:
261:
256:
253:
251:
248:
246:
243:
242:
241:
240:
235:
230:
227:
225:
222:
221:
220:
219:
214:
209:
206:
204:
201:
199:
196:
195:
194:
193:
188:
183:
180:
178:
175:
173:
170:
168:
165:
163:
160:
159:
158:
157:
152:
147:
144:
142:
139:
137:
134:
132:
129:
128:
127:
126:
121:
116:
113:
111:
108:
106:
103:
101:
98:
96:
95:Cox's theorem
93:
91:
88:
86:
83:
81:
78:
76:
73:
71:
68:
67:
66:
65:
60:
57:
53:
49:
45:
42:
41:
37:
33:
32:
29:
25:
21:
20:
1744:
1734:
1730:
1716:
1712:
1686:
1680:
1655:
1646:
1635:Please help
1627:
1569:Bayes factor
1558:
1360:Bayes factor
1352:
1345:
1341:
1339:
1235:
1169:
1161:Applications
1132:
1128:EM algorithm
1097:
737:
524:itself is a
367:data points
362:
345:
340:
336:
312:
310:
245:Bayes factor
55:
1641:introducing
816:frequentist
1672:References
1563:posterior
1124:Metropolis
609:is, where
339:or simply
321:integrated
190:Estimators
62:Background
48:Likelihood
1649:July 2010
1528:∣
1499:∣
1421:∣
1392:∣
1325:θ
1311:∣
1308:θ
1289:θ
1286:∣
1272:∫
1260:∣
1222:θ
1182:θ
1083:λ
1070:ψ
1067:∣
1064:λ
1051:ψ
1045:λ
1042:∣
1026:λ
1022:∫
1012:ψ
1009:∣
981:ψ
948:λ
928:ψ
908:λ
884:λ
864:ψ
841:λ
835:ψ
826:θ
799:θ
796:∣
762:α
759:∣
756:θ
723:θ
710:α
707:∣
704:θ
691:θ
688:∣
672:θ
668:∫
658:α
655:∣
629:has been
617:θ
594:α
591:∣
554:α
551:∣
548:θ
539:∼
536:θ
512:θ
492:θ
465:θ
448:∼
399:…
349:posterior
323:over the
90:Coherence
44:Posterior
1776:Category
1575:See also
856:, where
504:, where
341:evidence
56:Evidence
1637:improve
359:Concept
1751:
1723:332860
1721:
1076:
716:
430:where
1764:, by
1622:, or
1120:Gibbs
1110:or a
327:. In
315:is a
52:Prior
1749:ISBN
1719:SSRN
1565:odds
1691:doi
1238:is
1170:In
343:.
1778::
1626:,
1618:,
1362::
1130:.
1118:,
960::
355:.
311:A
54:Ă·
50:Ă—
46:=
1768:.
1757:.
1738:)
1725:)
1697:.
1693::
1662:)
1656:(
1651:)
1647:(
1633:.
1541:)
1536:2
1532:M
1524:X
1520:(
1517:p
1512:)
1507:1
1503:M
1495:X
1491:(
1488:p
1478:)
1473:2
1469:M
1465:(
1462:p
1457:)
1452:1
1448:M
1444:(
1441:p
1435:=
1429:)
1425:X
1416:2
1412:M
1408:(
1405:p
1400:)
1396:X
1387:1
1383:M
1379:(
1376:p
1356:2
1353:M
1349:1
1346:M
1321:d
1317:)
1314:M
1305:(
1302:p
1298:)
1295:M
1292:,
1282:X
1278:(
1275:p
1269:=
1266:)
1263:M
1256:X
1252:(
1249:p
1236:M
1202:M
1141:x
1122:/
1079:d
1073:)
1061:(
1058:p
1054:)
1048:,
1038:X
1034:(
1031:p
1018:=
1015:)
1005:X
1001:(
998:p
995:=
992:)
988:X
984:;
978:(
973:L
844:)
838:,
832:(
829:=
802:)
792:X
788:(
785:p
765:)
753:(
750:p
719:d
713:)
701:(
698:p
694:)
684:X
680:(
677:p
664:=
661:)
651:X
647:(
644:p
597:)
587:X
583:(
580:p
560:,
557:)
545:(
542:p
468:)
461:|
457:x
454:(
451:p
443:i
439:x
418:,
415:)
410:n
406:x
402:,
396:,
391:1
387:x
383:(
380:=
376:X
300:e
293:t
286:v
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.