Knowledge

Dynamic topic model

Source 📝

1342:
In the original paper, a dynamic topic model is applied to the corpus of Science articles published between 1881 and 1999 aiming to show that this method can be used to analyze the trends of word usage inside topics. The authors also show that the model trained with past documents is able to fit
435:
The former representation has some disadvantages due to the fact that the parameters are constrained to be non-negative and sum to one. When defining the evolution of these distributions, one would need to assure that such constraints were satisfied. Since both distributions are in the
43:, in a dynamic topic model the order of the documents plays a fundamental role. More precisely, the documents are grouped by time slice (e.g.: years) and it is assumed that the documents of each group come from a set of topics that evolved from the set of the previous slice. 776: 554: 870: 652: 1132: 1276: 432:, respectively. Even though multinomial distributions are usually written in terms of the mean parameters, representing them in terms of the natural parameters is better in the context of dynamic topic models. 1031: 946: 63:
over a set of terms. Thus, for each word of each document, a topic is drawn from the mixture and a term is subsequently drawn from the multinomial distribution corresponding to that topic.
370: 331: 430: 149: 397: 192: 110: 39:
In LDA, both the order the words appear in a document and the order the documents appear in the corpus are oblivious to the model. Whereas words are still assumed to be
1324: 294: 241: 1493: 1330:
to do inference in this model is more difficult than in static models, due to the nonconjugacy of the Gaussian and multinomial distributions. They propose the use of
1168: 665: 1349:
Going beyond text documents, dynamic topic models were used to study musical influence, by learning musical topics and how they evolve in recent history.
440:, one solution to this problem is to represent them in terms of the natural parameters, that can assume any real value and can be individually changed. 449: 783: 565: 1038: 28:
that can be used to analyze the evolution of (unobserved) topics of a collection of documents over time. This family of models was proposed by
1386: 1180: 956: 1488: 880: 59:, in a dynamic topic model, each document is viewed as a mixture of unobserved topics. Furthermore, each topic defines a 52: 33: 1326:
is observable. Learning the other parameters constitutes an inference problem. Blei and Lafferty argue that applying
1346:
A continuous dynamic topic model was developed by Wang et al. and applied to predict the timestamp of documents.
60: 40: 1411: 336: 1331: 303: 402: 121: 66:
The topics, however, evolve over time. For instance, the two most likely terms of a topic at time
1392: 437: 375: 164: 88: 1290: 260: 207: 1382: 1144: 1374: 25: 1435:
Wang, Chong; Blei, David; Heckerman, David (2008). "Continuous Time Dynamic Topic Models".
1334:, in particular, the Variational Kalman Filtering and the Variational Wavelet Regression. 1327: 771:{\displaystyle \beta _{t,k}|\beta _{t-1,k}\sim N(\beta _{t-1,k},\sigma ^{2}I)\forall k} 70:
could be "network" and "Zipf" (in descending order) while the most likely ones at time
1482: 1396: 443:
Using the natural parameterization, the dynamics of the topic model are given by
1451: 1371:
Proceedings of the 23rd international conference on Machine learning - ICML '06
549:{\displaystyle \beta _{t,k}|\beta _{t-1,k}\sim N(\beta _{t-1,k},\sigma ^{2}I)} 29: 17: 1378: 865:{\displaystyle \alpha _{t}|\alpha _{t-1}\sim N(\alpha _{t-1},\delta ^{2}I)} 647:{\displaystyle \alpha _{t}|\alpha _{t-1}\sim N(\alpha _{t-1},\delta ^{2}I)} 1127:{\displaystyle W_{t,d,n}\sim {\textrm {Mult}}(\pi (\beta _{t,Z_{t,d,n}}))} 1271:{\displaystyle \pi (x_{i})={\frac {\exp(x_{i})}{\sum _{i}\exp(x_{i})}}} 1026:{\displaystyle Z_{t,d,n}\sim {\textrm {Mult}}(\pi (\eta _{t,d}))} 1369:
Blei, David M; Lafferty, John D (2006). "Dynamic topic models".
56: 74:
could be "Zipf" and "percolation" (in descending order).
658:
The generative process at time slice 't' is therefore:
1450:
Shalit, Uri; Weinshall, Daphna; Chechik, Gal (2013).
1293: 1183: 1147: 1041: 959: 941:{\displaystyle \eta _{t,d}\sim N(\alpha _{t},a^{2}I)} 883: 786: 668: 568: 452: 405: 378: 339: 306: 263: 210: 167: 124: 91: 1318: 1270: 1162: 1126: 1025: 940: 864: 770: 646: 548: 424: 391: 364: 325: 288: 235: 186: 143: 104: 1343:documents of an incoming year better than LDA. 1170:is a mapping from the natural parameterization 112:as the per-document topic distribution at time 1452:"Modeling musical influence with topic models" 300:In this model, the multinomial distributions 8: 36:(LDA) that can handle sequential documents. 1298: 1292: 1256: 1237: 1222: 1206: 1194: 1182: 1146: 1098: 1087: 1068: 1067: 1046: 1040: 1005: 986: 985: 964: 958: 926: 913: 888: 882: 850: 831: 806: 797: 791: 785: 750: 725: 694: 685: 673: 667: 632: 613: 588: 579: 573: 567: 534: 509: 478: 469: 457: 451: 410: 404: 383: 377: 344: 338: 311: 305: 268: 262: 215: 209: 172: 166: 129: 123: 96: 90: 32:and John Lafferty and is an extension to 1494:Statistical natural language processing 1358: 194:as the topic distribution for document 1364: 1362: 1174:to the mean parameterization, namely 7: 1459:Journal of Machine Learning Research 762: 151:as the word distribution of topic 14: 1287:In the dynamic topic model, only 1262: 1249: 1228: 1215: 1200: 1187: 1157: 1151: 1121: 1118: 1080: 1074: 1020: 1017: 998: 992: 935: 906: 859: 824: 798: 759: 718: 686: 641: 606: 580: 543: 502: 470: 365:{\displaystyle \beta _{t+1,k}} 1: 1373:. ICML'06. pp. 113–120. 326:{\displaystyle \alpha _{t+1}} 425:{\displaystyle \beta _{t,k}} 144:{\displaystyle \beta _{t,k}} 392:{\displaystyle \alpha _{t}} 187:{\displaystyle \eta _{t,d}} 105:{\displaystyle \alpha _{t}} 34:Latent Dirichlet Allocation 1510: 1412:"Mixtures of Multinomials" 1319:{\displaystyle W_{t,d,n}} 289:{\displaystyle w_{t,d,n}} 236:{\displaystyle z_{t,d,n}} 61:multinomial distribution 1379:10.1145/1143844.1143859 1163:{\displaystyle \pi (x)} 1489:Latent variable models 1320: 1272: 1164: 1128: 1027: 942: 866: 772: 648: 550: 426: 393: 366: 327: 290: 237: 188: 145: 106: 1321: 1273: 1165: 1129: 1028: 943: 867: 773: 649: 551: 427: 394: 367: 328: 296:as the specific word. 291: 243:as the topic for the 238: 189: 146: 107: 22:Dynamic topic models' 1410:Rennie, Jason D. M. 1291: 1181: 1145: 1039: 957: 881: 784: 666: 566: 450: 403: 376: 337: 304: 261: 247:th word in document 208: 165: 122: 89: 1437:Proceedings of ICML 1332:variational methods 874:For each document: 780:Draw mixture model 372:are generated from 1316: 1268: 1242: 1160: 1124: 1023: 938: 862: 768: 644: 546: 438:exponential family 422: 389: 362: 323: 286: 233: 184: 141: 102: 1388:978-1-59593-383-6 1266: 1233: 1071: 989: 26:generative models 1501: 1463: 1462: 1456: 1447: 1441: 1440: 1432: 1426: 1425: 1423: 1421: 1416: 1407: 1401: 1400: 1366: 1325: 1323: 1322: 1317: 1315: 1314: 1277: 1275: 1274: 1269: 1267: 1265: 1261: 1260: 1241: 1231: 1227: 1226: 1207: 1199: 1198: 1169: 1167: 1166: 1161: 1133: 1131: 1130: 1125: 1117: 1116: 1115: 1114: 1073: 1072: 1069: 1063: 1062: 1032: 1030: 1029: 1024: 1016: 1015: 991: 990: 987: 981: 980: 947: 945: 944: 939: 931: 930: 918: 917: 899: 898: 871: 869: 868: 863: 855: 854: 842: 841: 817: 816: 801: 796: 795: 777: 775: 774: 769: 755: 754: 742: 741: 711: 710: 689: 684: 683: 653: 651: 650: 645: 637: 636: 624: 623: 599: 598: 583: 578: 577: 555: 553: 552: 547: 539: 538: 526: 525: 495: 494: 473: 468: 467: 431: 429: 428: 423: 421: 420: 398: 396: 395: 390: 388: 387: 371: 369: 368: 363: 361: 360: 332: 330: 329: 324: 322: 321: 295: 293: 292: 287: 285: 284: 242: 240: 239: 234: 232: 231: 193: 191: 190: 185: 183: 182: 150: 148: 147: 142: 140: 139: 111: 109: 108: 103: 101: 100: 73: 69: 1509: 1508: 1504: 1503: 1502: 1500: 1499: 1498: 1479: 1478: 1467: 1466: 1454: 1449: 1448: 1444: 1434: 1433: 1429: 1419: 1417: 1414: 1409: 1408: 1404: 1389: 1368: 1367: 1360: 1355: 1340: 1294: 1289: 1288: 1285: 1252: 1232: 1218: 1208: 1190: 1179: 1178: 1143: 1142: 1094: 1083: 1042: 1037: 1036: 1001: 960: 955: 954: 950:For each word: 922: 909: 884: 879: 878: 846: 827: 802: 787: 782: 781: 746: 721: 690: 669: 664: 663: 628: 609: 584: 569: 564: 563: 530: 505: 474: 453: 448: 447: 406: 401: 400: 379: 374: 373: 340: 335: 334: 307: 302: 301: 264: 259: 258: 211: 206: 205: 168: 163: 162: 125: 120: 119: 92: 87: 86: 80: 71: 67: 49: 12: 11: 5: 1507: 1505: 1497: 1496: 1491: 1481: 1480: 1477: 1476: 1474: 1472: 1470: 1465: 1464: 1442: 1427: 1402: 1387: 1357: 1356: 1354: 1351: 1339: 1336: 1328:Gibbs sampling 1313: 1310: 1307: 1304: 1301: 1297: 1284: 1281: 1280: 1279: 1264: 1259: 1255: 1251: 1248: 1245: 1240: 1236: 1230: 1225: 1221: 1217: 1214: 1211: 1205: 1202: 1197: 1193: 1189: 1186: 1159: 1156: 1153: 1150: 1139: 1138: 1137: 1136: 1135: 1134: 1123: 1120: 1113: 1110: 1107: 1104: 1101: 1097: 1093: 1090: 1086: 1082: 1079: 1076: 1066: 1061: 1058: 1055: 1052: 1049: 1045: 1033: 1022: 1019: 1014: 1011: 1008: 1004: 1000: 997: 994: 984: 979: 976: 973: 970: 967: 963: 948: 937: 934: 929: 925: 921: 916: 912: 908: 905: 902: 897: 894: 891: 887: 872: 861: 858: 853: 849: 845: 840: 837: 834: 830: 826: 823: 820: 815: 812: 809: 805: 800: 794: 790: 778: 767: 764: 761: 758: 753: 749: 745: 740: 737: 734: 731: 728: 724: 720: 717: 714: 709: 706: 703: 700: 697: 693: 688: 682: 679: 676: 672: 656: 655: 643: 640: 635: 631: 627: 622: 619: 616: 612: 608: 605: 602: 597: 594: 591: 587: 582: 576: 572: 557: 556: 545: 542: 537: 533: 529: 524: 521: 518: 515: 512: 508: 504: 501: 498: 493: 490: 487: 484: 481: 477: 472: 466: 463: 460: 456: 419: 416: 413: 409: 386: 382: 359: 356: 353: 350: 347: 343: 320: 317: 314: 310: 298: 297: 283: 280: 277: 274: 271: 267: 256: 230: 227: 224: 221: 218: 214: 203: 181: 178: 175: 171: 160: 138: 135: 132: 128: 117: 99: 95: 79: 76: 48: 45: 13: 10: 9: 6: 4: 3: 2: 1506: 1495: 1492: 1490: 1487: 1486: 1484: 1475: 1473: 1471: 1469: 1468: 1460: 1453: 1446: 1443: 1438: 1431: 1428: 1413: 1406: 1403: 1398: 1394: 1390: 1384: 1380: 1376: 1372: 1365: 1363: 1359: 1352: 1350: 1347: 1344: 1337: 1335: 1333: 1329: 1311: 1308: 1305: 1302: 1299: 1295: 1282: 1257: 1253: 1246: 1243: 1238: 1234: 1223: 1219: 1212: 1209: 1203: 1195: 1191: 1184: 1177: 1176: 1175: 1173: 1154: 1148: 1111: 1108: 1105: 1102: 1099: 1095: 1091: 1088: 1084: 1077: 1064: 1059: 1056: 1053: 1050: 1047: 1043: 1034: 1012: 1009: 1006: 1002: 995: 982: 977: 974: 971: 968: 965: 961: 952: 951: 949: 932: 927: 923: 919: 914: 910: 903: 900: 895: 892: 889: 885: 876: 875: 873: 856: 851: 847: 843: 838: 835: 832: 828: 821: 818: 813: 810: 807: 803: 792: 788: 779: 765: 756: 751: 747: 743: 738: 735: 732: 729: 726: 722: 715: 712: 707: 704: 701: 698: 695: 691: 680: 677: 674: 670: 661: 660: 659: 638: 633: 629: 625: 620: 617: 614: 610: 603: 600: 595: 592: 589: 585: 574: 570: 562: 561: 560: 540: 535: 531: 527: 522: 519: 516: 513: 510: 506: 499: 496: 491: 488: 485: 482: 479: 475: 464: 461: 458: 454: 446: 445: 444: 441: 439: 433: 417: 414: 411: 407: 384: 380: 357: 354: 351: 348: 345: 341: 318: 315: 312: 308: 281: 278: 275: 272: 269: 265: 257: 254: 250: 246: 228: 225: 222: 219: 216: 212: 204: 201: 197: 179: 176: 173: 169: 161: 158: 154: 136: 133: 130: 126: 118: 115: 97: 93: 85: 84: 83: 77: 75: 64: 62: 58: 54: 51:Similarly to 46: 44: 42: 37: 35: 31: 27: 23: 19: 1458: 1445: 1436: 1430: 1418:. Retrieved 1405: 1370: 1348: 1345: 1341: 1338:Applications 1286: 1171: 1140: 662:Draw topics 657: 558: 442: 434: 299: 252: 248: 244: 199: 195: 156: 152: 113: 81: 65: 50: 41:exchangeable 38: 21: 15: 1439:. ICML '08. 953:Draw topic 1483:Categories 1420:5 December 1353:References 1035:Draw word 30:David Blei 18:statistics 1283:Inference 1247:⁡ 1235:∑ 1213:⁡ 1185:π 1149:π 1085:β 1078:π 1065:∼ 1003:η 996:π 983:∼ 911:α 901:∼ 886:η 848:δ 836:− 829:α 819:∼ 811:− 804:α 789:α 763:∀ 748:σ 730:− 723:β 713:∼ 699:− 692:β 671:β 630:δ 618:− 611:α 601:∼ 593:− 586:α 571:α 532:σ 514:− 507:β 497:∼ 483:− 476:β 455:β 408:β 381:α 342:β 309:α 170:η 127:β 94:α 251:in time 198:in time 155:at time 1397:5405229 82:Define 16:Within 1395:  1385:  1141:where 47:Topics 1455:(PDF) 1415:(PDF) 1393:S2CID 877:Draw 255:, and 78:Model 1422:2011 1383:ISBN 1070:Mult 988:Mult 559:and 399:and 333:and 57:pLSA 55:and 24:are 1375:doi 1244:exp 1210:exp 72:t+1 53:LDA 1485:: 1457:. 1391:. 1381:. 1361:^ 20:, 1461:. 1424:. 1399:. 1377:: 1312:n 1309:, 1306:d 1303:, 1300:t 1296:W 1278:. 1263:) 1258:i 1254:x 1250:( 1239:i 1229:) 1224:i 1220:x 1216:( 1204:= 1201:) 1196:i 1192:x 1188:( 1172:x 1158:) 1155:x 1152:( 1122:) 1119:) 1112:n 1109:, 1106:d 1103:, 1100:t 1096:Z 1092:, 1089:t 1081:( 1075:( 1060:n 1057:, 1054:d 1051:, 1048:t 1044:W 1021:) 1018:) 1013:d 1010:, 1007:t 999:( 993:( 978:n 975:, 972:d 969:, 966:t 962:Z 936:) 933:I 928:2 924:a 920:, 915:t 907:( 904:N 896:d 893:, 890:t 860:) 857:I 852:2 844:, 839:1 833:t 825:( 822:N 814:1 808:t 799:| 793:t 766:k 760:) 757:I 752:2 744:, 739:k 736:, 733:1 727:t 719:( 716:N 708:k 705:, 702:1 696:t 687:| 681:k 678:, 675:t 654:. 642:) 639:I 634:2 626:, 621:1 615:t 607:( 604:N 596:1 590:t 581:| 575:t 544:) 541:I 536:2 528:, 523:k 520:, 517:1 511:t 503:( 500:N 492:k 489:, 486:1 480:t 471:| 465:k 462:, 459:t 418:k 415:, 412:t 385:t 358:k 355:, 352:1 349:+ 346:t 319:1 316:+ 313:t 282:n 279:, 276:d 273:, 270:t 266:w 253:t 249:d 245:n 229:n 226:, 223:d 220:, 217:t 213:z 202:, 200:t 196:d 180:d 177:, 174:t 159:. 157:t 153:k 137:k 134:, 131:t 116:. 114:t 98:t 68:t

Index

statistics
generative models
David Blei
Latent Dirichlet Allocation
exchangeable
LDA
pLSA
multinomial distribution
exponential family
Gibbs sampling
variational methods


doi
10.1145/1143844.1143859
ISBN
978-1-59593-383-6
S2CID
5405229
"Mixtures of Multinomials"
"Modeling musical influence with topic models"
Categories
Latent variable models
Statistical natural language processing

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.