Knowledge (XXG)

Probabilistic latent semantic analysis

Source 📝

79: 879:
Higher-order data: Although this is rarely discussed in the scientific literature, PLSA extends naturally to higher order data (three modes and higher), i.e. it can model co-occurrences over three or more variables. In the symmetric formulation above, this is done simply by adding conditional
1026: 789:. The number of parameters grows linearly with the number of documents. In addition, although PLSA is a generative model of the documents in the collection it is estimated on, it is not a generative model of new documents. 493: 46:
for the analysis of two-mode and co-occurrence data. In effect, one can derive a low-dimensional representation of the observed variables in terms of their affinity to certain hidden variables, just as in
936: 864:
Generative models: The following models have been developed to address an often-criticized shortcoming of PLSA, namely that it is not a proper generative model for new documents.
518:
being the words' topic. Note that the number of topics is a hyperparameter that must be chosen in advance and is not estimated from the data. The first formulation is the
942: 1137: 752: 715: 654: 617: 217: 160: 1097: 787: 321: 678: 580: 560: 540: 516: 281: 257: 237: 180: 123: 103: 754:. Although we have used words and documents in this example, the co-occurrence of any couple of discrete variables may be modelled in exactly the same way. 1122: 1046: 333: 1068:
Nonnegative Matrix Factorization and Probabilistic Latent Semantic Indexing: Equivalence Chi-Square Statistic, and a Hybrid Method. AAAI 2006"
959:
Pinoli, Pietro; et, al. (2013). "Enhanced probabilistic latent semantic analysis with weighting schemes to predict genomic annotations".
1117: 976: 1142: 894: 880:
probability distributions for these additional variables. This is the probabilistic analogue to non-negative tensor factorisation.
1147: 1054: 63: 938:
Learning the Similarity of Documents : an information-geometric approach to document retrieval and categorization
867: 820: 323:
of words and documents, PLSA models the probability of each co-occurrence as a mixture of conditionally independent
1152: 906: 324: 55: 48: 871: 1101: 1080:
On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing"
816: 812: 43: 1043: 1079: 1027:
A Probabilistic Framework for the Hierarchic Organisation and Classification of Document Collections
963:. The 13th IEEE International Conference on BioInformatics and BioEngineering. IEEE. pp. 1–4. 911: 260: 835: 916: 890: 67: 972: 66:), probabilistic latent semantic analysis is based on a mixture decomposition derived from a 1008: 964: 824: 720: 683: 622: 585: 185: 128: 760: 294: 1050: 284: 993: 828: 663: 565: 545: 525: 501: 266: 242: 222: 165: 108: 88: 82: 59: 1067: 1131: 805: 793: 1012: 839: 968: 946: 488:{\displaystyle P(w,d)=\sum _{c}P(c)P(d|c)P(w|c)=P(d)\sum _{c}P(c|d)P(w|c)} 857:
Symmetric: HPLSA ("Hierarchical Probabilistic Latent Semantic Analysis")
680:, a latent class is chosen conditionally to the document according to 17: 1053:, in "Advances in Information Retrieval -- Proceedings of the 24th 1091: 854:
Asymmetric: MASHA ("Multinomial ASymmetric Hierarchical Analysis")
78: 182:
is a word drawn from the word distribution of this word's topic,
897:. The present terminology was coined in 1999 by Thomas Hofmann. 125:
is a word's topic drawn from the document's topic distribution,
1044:
A Hierarchical Model for Clustering and Categorising Documents
838:
used in the probabilistic latent semantic analysis has severe
717:, and a word is then generated from that class according to 1042:
Eric Gaussier, Cyril Goutte, Kris Popat and Francine Chen,
85:
representing the PLSA model ("asymmetric" formulation).
1096:, Proceedings of the Twenty-Second Annual International 992:
Blei, David M.; Andrew Y. Ng; Michael I. Jordan (2003).
291:
Considering observations in the form of co-occurrences
763: 723: 686: 666: 625: 588: 582:
in similar ways (using the conditional probabilities
568: 548: 528: 504: 336: 297: 269: 245: 225: 188: 168: 131: 111: 91: 62:
and downsizes the occurrence tables (usually via a
42:, especially in information retrieval circles) is a 1057:
European Colloquium on IR Research (ECIR-02)", 2002
804:PLSA may be used in a discriminative setting, via 781: 746: 709: 672: 648: 611: 574: 554: 534: 510: 487: 315: 275: 251: 231: 211: 174: 154: 117: 97: 943:Advances in Neural Information Processing Systems 893:(see references therein), and it is related to 8: 874:prior on the per-document topic distribution 1100:Conference on Research and Development in 762: 757:So, the number of parameters is equal to 733: 722: 696: 685: 665: 656:), whereas the second formulation is the 635: 624: 598: 587: 567: 562:are both generated from the latent class 547: 527: 503: 474: 454: 439: 409: 389: 362: 335: 296: 268: 244: 224: 198: 187: 167: 141: 130: 110: 90: 77: 1138:Statistical natural language processing 928: 792:Their parameters are learned using the 1118:Probabilistic Latent Semantic Analysis 1093:Probabilistic Latent Semantic Indexing 1078:Chris Ding, Tao Li, Wei Peng (2008). " 1066:Chris Ding, Tao Li, Wei Peng (2006). " 660:formulation, where, for each document 36:probabilistic latent semantic indexing 28:Probabilistic latent semantic analysis 1031:Information Processing and Management 1025:Alexei Vinokourov and Mark Girolami, 7: 1001:Journal of Machine Learning Research 25: 895:non-negative matrix factorization 105:is the document index variable, 741: 734: 727: 704: 697: 690: 643: 636: 629: 606: 599: 592: 482: 475: 468: 462: 455: 448: 432: 426: 417: 410: 403: 397: 390: 383: 377: 371: 352: 340: 310: 298: 206: 199: 192: 149: 142: 135: 1: 994:"Latent Dirichlet Allocation" 961:Proceedings of IEEE BIBE 2013 64:singular value decomposition 1013:10.1162/jmlr.2003.3.4-5.993 868:Latent Dirichlet allocation 821:natural language processing 51:, from which PLSA evolved. 1169: 1143:Classification algorithms 969:10.1109/BIBE.2013.6701702 851:Hierarchical extensions: 811:PLSA has applications in 325:multinomial distributions 1123:Complete PLSA DEMO in C# 907:Compound term processing 889:This is an example of a 834:It is reported that the 56:latent semantic analysis 49:latent semantic analysis 1148:Latent variable models 783: 748: 747:{\displaystyle P(w|c)} 711: 710:{\displaystyle P(c|d)} 674: 650: 649:{\displaystyle P(w|c)} 613: 612:{\displaystyle P(d|c)} 576: 556: 536: 512: 489: 317: 288: 277: 253: 233: 213: 212:{\displaystyle P(w|c)} 176: 156: 155:{\displaystyle P(c|d)} 119: 99: 1102:Information Retrieval 831:, and related areas. 813:information retrieval 784: 782:{\displaystyle cd+wc} 749: 712: 675: 651: 614: 577: 557: 537: 513: 490: 318: 316:{\displaystyle (w,d)} 278: 254: 234: 214: 177: 157: 120: 100: 81: 54:Compared to standard 44:statistical technique 923:References and notes 761: 721: 684: 664: 623: 586: 566: 546: 526: 502: 334: 295: 267: 261:observable variables 243: 223: 186: 166: 129: 109: 89: 912:Pachinko allocation 522:formulation, where 1049:2016-03-04 at the 917:Vector space model 891:latent class model 779: 744: 707: 670: 646: 609: 572: 552: 532: 508: 485: 444: 367: 313: 289: 273: 249: 229: 209: 172: 152: 115: 95: 68:latent class model 1153:Language modeling 673:{\displaystyle d} 575:{\displaystyle c} 555:{\displaystyle d} 535:{\displaystyle w} 511:{\displaystyle c} 435: 358: 276:{\displaystyle c} 252:{\displaystyle w} 232:{\displaystyle d} 175:{\displaystyle w} 118:{\displaystyle c} 98:{\displaystyle d} 58:which stems from 34:), also known as 16:(Redirected from 1160: 1105: 1104:(SIGIR-99), 1999 1090:Thomas Hofmann, 1088: 1082: 1076: 1070: 1064: 1058: 1040: 1034: 1023: 1017: 1016: 998: 989: 983: 982: 956: 950: 945:12, pp-914-920, 935:Thomas Hofmann, 933: 825:machine learning 788: 786: 785: 780: 753: 751: 750: 745: 737: 716: 714: 713: 708: 700: 679: 677: 676: 671: 655: 653: 652: 647: 639: 618: 616: 615: 610: 602: 581: 579: 578: 573: 561: 559: 558: 553: 541: 539: 538: 533: 517: 515: 514: 509: 494: 492: 491: 486: 478: 458: 443: 413: 393: 366: 322: 320: 319: 314: 282: 280: 279: 274: 258: 256: 255: 250: 238: 236: 235: 230: 218: 216: 215: 210: 202: 181: 179: 178: 173: 161: 159: 158: 153: 145: 124: 122: 121: 116: 104: 102: 101: 96: 21: 1168: 1167: 1163: 1162: 1161: 1159: 1158: 1157: 1128: 1127: 1114: 1109: 1108: 1089: 1085: 1077: 1073: 1065: 1061: 1051:Wayback Machine 1041: 1037: 1024: 1020: 996: 991: 990: 986: 979: 978:978-147993163-7 958: 957: 953: 934: 930: 925: 903: 887: 848: 802: 759: 758: 719: 718: 682: 681: 662: 661: 621: 620: 584: 583: 564: 563: 544: 543: 524: 523: 500: 499: 332: 331: 293: 292: 285:latent variable 265: 264: 241: 240: 221: 220: 184: 183: 164: 163: 127: 126: 107: 106: 87: 86: 76: 23: 22: 15: 12: 11: 5: 1166: 1164: 1156: 1155: 1150: 1145: 1140: 1130: 1129: 1126: 1125: 1120: 1113: 1112:External links 1110: 1107: 1106: 1083: 1071: 1059: 1035: 1018: 984: 977: 951: 927: 926: 924: 921: 920: 919: 914: 909: 902: 899: 886: 883: 882: 881: 877: 876: 875: 861: 860: 859: 858: 855: 847: 844: 829:bioinformatics 806:Fisher kernels 801: 798: 778: 775: 772: 769: 766: 743: 740: 736: 732: 729: 726: 706: 703: 699: 695: 692: 689: 669: 645: 642: 638: 634: 631: 628: 608: 605: 601: 597: 594: 591: 571: 551: 531: 507: 496: 495: 484: 481: 477: 473: 470: 467: 464: 461: 457: 453: 450: 447: 442: 438: 434: 431: 428: 425: 422: 419: 416: 412: 408: 405: 402: 399: 396: 392: 388: 385: 382: 379: 376: 373: 370: 365: 361: 357: 354: 351: 348: 345: 342: 339: 312: 309: 306: 303: 300: 272: 248: 228: 208: 205: 201: 197: 194: 191: 171: 151: 148: 144: 140: 137: 134: 114: 94: 83:Plate notation 75: 72: 60:linear algebra 24: 14: 13: 10: 9: 6: 4: 3: 2: 1165: 1154: 1151: 1149: 1146: 1144: 1141: 1139: 1136: 1135: 1133: 1124: 1121: 1119: 1116: 1115: 1111: 1103: 1099: 1095: 1094: 1087: 1084: 1081: 1075: 1072: 1069: 1063: 1060: 1056: 1052: 1048: 1045: 1039: 1036: 1032: 1028: 1022: 1019: 1014: 1010: 1006: 1002: 995: 988: 985: 980: 974: 970: 966: 962: 955: 952: 948: 944: 940: 939: 932: 929: 922: 918: 915: 913: 910: 908: 905: 904: 900: 898: 896: 892: 884: 878: 873: 869: 866: 865: 863: 862: 856: 853: 852: 850: 849: 845: 843: 841: 837: 832: 830: 826: 822: 818: 814: 809: 807: 799: 797: 795: 790: 776: 773: 770: 767: 764: 755: 738: 730: 724: 701: 693: 687: 667: 659: 640: 632: 626: 603: 595: 589: 569: 549: 529: 521: 505: 479: 471: 465: 459: 451: 445: 440: 436: 429: 423: 420: 414: 406: 400: 394: 386: 380: 374: 368: 363: 359: 355: 349: 346: 343: 337: 330: 329: 328: 326: 307: 304: 301: 286: 270: 262: 246: 226: 203: 195: 189: 169: 146: 138: 132: 112: 92: 84: 80: 73: 71: 69: 65: 61: 57: 52: 50: 45: 41: 37: 33: 29: 19: 1092: 1086: 1074: 1062: 1038: 1030: 1021: 1007:: 993–1022. 1004: 1000: 987: 960: 954: 937: 931: 888: 836:aspect model 833: 810: 803: 794:EM algorithm 791: 756: 657: 519: 497: 290: 263:, the topic 53: 39: 35: 31: 27: 26: 840:overfitting 827:from text, 800:Application 1132:Categories 846:Extensions 842:problems. 658:asymmetric 947:MIT Press 872:Dirichlet 870:– adds a 817:filtering 520:symmetric 437:∑ 360:∑ 1055:BCS-IRSG 1047:Archived 901:See also 885:History 219:. The 1033:, 2002 975:  949:, 2000 162:, and 1098:SIGIR 1029:, in 997:(PDF) 498:with 283:is a 74:Model 973:ISBN 815:and 619:and 542:and 259:are 239:and 40:PLSI 32:PLSA 18:PLSA 1009:doi 965:doi 1134:: 1003:. 999:. 971:. 941:, 823:, 819:, 808:. 796:. 327:: 70:. 1015:. 1011:: 1005:3 981:. 967:: 777:c 774:w 771:+ 768:d 765:c 742:) 739:c 735:| 731:w 728:( 725:P 705:) 702:d 698:| 694:c 691:( 688:P 668:d 644:) 641:c 637:| 633:w 630:( 627:P 607:) 604:c 600:| 596:d 593:( 590:P 570:c 550:d 530:w 506:c 483:) 480:c 476:| 472:w 469:( 466:P 463:) 460:d 456:| 452:c 449:( 446:P 441:c 433:) 430:d 427:( 424:P 421:= 418:) 415:c 411:| 407:w 404:( 401:P 398:) 395:c 391:| 387:d 384:( 381:P 378:) 375:c 372:( 369:P 364:c 356:= 353:) 350:d 347:, 344:w 341:( 338:P 311:) 308:d 305:, 302:w 299:( 287:. 271:c 247:w 227:d 207:) 204:c 200:| 196:w 193:( 190:P 170:w 150:) 147:d 143:| 139:c 136:( 133:P 113:c 93:d 38:( 30:( 20:)

Index

PLSA
statistical technique
latent semantic analysis
latent semantic analysis
linear algebra
singular value decomposition
latent class model

Plate notation
observable variables
latent variable
multinomial distributions
EM algorithm
Fisher kernels
information retrieval
filtering
natural language processing
machine learning
bioinformatics
aspect model
overfitting
Latent Dirichlet allocation
Dirichlet
latent class model
non-negative matrix factorization
Compound term processing
Pachinko allocation
Vector space model
Learning the Similarity of Documents : an information-geometric approach to document retrieval and categorization
Advances in Neural Information Processing Systems

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.