1167:, Most Suitable Sense Annotation (MSSA) labels word-senses through an unsupervised and knowledge-based approach, considering a word's context in a pre-defined sliding window. Once the words are disambiguated, they can be used in a standard word embeddings technique, so multi-sense embeddings are produced. MSSA architecture allows the disambiguation and annotation process to be performed recurrently in a self-improving manner.
1096:, a word embedding toolkit that can train vector space models faster than previous approaches. The word2vec approach has been widely used in experimentation and was instrumental in raising interest for word embeddings as a technology, moving the research strand out of specialised research into broader experimentation and eventually paving the way for practical application.
1346:
News texts (a commonly used data corpus), which consists of text written by professional journalists, still shows disproportionate word associations reflecting gender and racial biases when extracting word analogies. For example, one of the analogies generated using the aforementioned word embedding is “man is to computer programmer as woman is to homemaker”.
1210:
applications have been proposed by Asgari and Mofrad. Named bio-vectors (BioVec) to refer to biological sequences in general with protein-vectors (ProtVec) for proteins (amino-acid sequences) and gene-vectors (GeneVec) for gene sequences, this representation can be widely used in applications of deep
1022:
models have been used as a knowledge representation for some time. Such models aim to quantify and categorize semantic similarities between linguistic items based on their distributional properties in large samples of language data. The underlying idea that "a word is characterized by the company it
1150:
skip-gram, Multi-Sense Skip-Gram (MSSG) performs word-sense discrimination and embedding simultaneously, improving its training time, while assuming a specific number of senses for each word. In the Non-Parametric Multi-Sense Skip-Gram (NP-MSSG) this number can vary depending on each word. Combining
1193:
have been developed. Unlike static word embeddings, these embeddings are at the token-level, in that each occurrence of a word has its own embedding. These embeddings better reflect the multi-sense nature of words, because occurrences of a word in similar contexts are situated in similar regions of
1067:
Word embeddings come in two different styles, one in which words are expressed as vectors of co-occurring words, and another in which words are expressed as vectors of linguistic contexts in which the words occur; these different styles are studied in
Lavelli et al., 2004. Roweis and Saul published
1345:
Word embeddings may contain the biases and stereotypes contained in the trained dataset, as
Bolukbasi et al. points out in the 2016 paper “Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings” that a publicly available (and popular) word2vec embedding trained on Google
1030:
The notion of a semantic space with lexical items (words or multi-word terms) represented as vectors or embeddings is based on the computational challenges of capturing distributional characteristics and using them for practical application to measure similarity between words, phrases, or entire
2644:
Reimers, Nils, and Iryna
Gurevych. "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks." In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3982-3992.
1083:
The approach has been adopted by many research groups after theoretical advances in 2010 had been made on the quality of vectors and the training speed of the model, as well as after hardware advances allowed for a broader parameter space to be explored profitably. In 2013, a team at
1349:
Research done by Jieyu Zhou et al. shows that the applications of these trained word embeddings without careful oversight likely perpetuates existing bias in society, which is introduced through unaltered training data. Furthermore, word embeddings can even amplify these biases .
2166:. Vol. Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Los Angeles, California: Association for Computational Linguistics. pp. 109–117.
1142:
might have. The necessity to accommodate multiple meanings per word in different vectors (multi-sense embeddings) is the motivation for several contributions in NLP to split single-sense embeddings into multi-sense ones.
1059:
A study published in NeurIPS (NIPS) 2002 introduced the use of both word and document embeddings applying the method of kernel CCA to bilingual (and multi-lingual) corpora, also providing an early example of
1055:
et al. provided in a series of papers titled "Neural probabilistic language models" to reduce the high dimensionality of word representations in contexts by "learning a distributed representation for words".
2489:
Reif, Emily, Ann Yuan, Martin
Wattenberg, Fernanda B. Viegas, Andy Coenen, Adam Pearce, and Been Kim. "Visualizing and measuring the geometry of BERT." Advances in Neural Information Processing Systems 32
1872:, Proceedings of the Methods and Applications of Semantic Indexing Workshop at the 7th International Conference on Terminology and Knowledge Engineering, TKE 2005, August 16, Copenhagen, Denmark
1035:
for information retrieval. Such vector space models for words and their distributional data implemented in their simplest form results in a very sparse vector space of high dimensionality (cf.
1219:. The results presented by Asgari and Mofrad suggest that BioVectors can characterize biological sequences in terms of biochemical and biophysical interpretations of the underlying patterns.
866:
2480:
Lucy, Li, and David Bamman. "Characterizing
English variation across social media communities with BERT." Transactions of the Association for Computational Linguistics 9 (2021): 538-556.
964:
vector that encodes the meaning of the word in such a way that the words that are closer in the vector space are expected to be similar in meaning. Word embeddings can be obtained using
904:
1239:
and then using the resulting text to create word embeddings. The results presented by Rabii and Cook suggest that the resulting vectors can capture expert knowledge about games like
2888:
Bolukbasi, Tolga; Chang, Kai-Wei; Zou, James; Saligrama, Venkatesh; Kalai, Adam (2016-07-21). "Man is to
Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings".
3044:
1146:
Most approaches that produce multi-sense embeddings can be divided into two main categories for their word sense representation, i.e., unsupervised and knowledge-based. Based on
3204:
2452:
Proceedings of the 2019 Conference of the North
American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
861:
1815:
851:
2867:
Bolukbasi, Tolga; Chang, Kai-Wei; Zou, James; Saligrama, Venkatesh; Kalai, Adam (2016). "Man is to
Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings".
692:
2238:
Neelakantan, Arvind; Shankar, Jeevan; Passos, Alexandre; McCallum, Andrew (2014). "Efficient Non-parametric
Estimation of Multiple Embeddings per Word in Vector Space".
1573:
899:
3182:
1411:
Mikolov, Tomas; Sutskever, Ilya; Chen, Kai; Corrado, Greg; Dean, Jeffrey (2013). "Distributed
Representations of Words and Phrases and their Compositionality".
1314:
856:
707:
438:
939:
742:
3593:
3037:
2624:
Kiros, Ryan; Zhu, Yukun; Salakhutdinov, Ruslan; Zemel, Richard S.; Torralba, Antonio; Urtasun, Raquel; Fidler, Sanja (2015). "skip-thought vectors".
3762:
818:
1593:
1364:
367:
1946:
Bengio, Yoshua; Schwenk, Holger; Senécal, Jean-Sébastien; Morin, Fréderic; Gauvain, Jean-Luc (2006). "A Neural Probabilistic Language Model".
2783:
2171:
1963:
1395:
998:
Word and phrase embeddings, when used as the underlying input representation, have been shown to boost the performance in NLP tasks such as
3803:
3503:
3194:
3030:
876:
639:
174:
2217:
Camacho-Collados, Jose; Pilehvar, Mohammad Taher (2018). "From Word to Sense Embeddings: A Survey on Vector Representations of Meaning".
2082:
1108:
is that words with multiple meanings are conflated into a single representation (a single vector in the semantic space). In other words,
3798:
3757:
2910:
1896:
894:
1387:
Speech and language processing : an introduction to natural language processing, computational linguistics, and speech recognition
1076:" (LLE) to discover representations of high dimensional data structures. Most new word embedding techniques after about 2005 rely on a
3808:
3364:
1265:. A more recent and popular approach for representing sentences is Sentence-BERT, or SentenceTransformers, which modifies pre-trained
1073:
995:, probabilistic models, explainable knowledge base method, and explicit representation in terms of the context in which words appear.
727:
702:
651:
3518:
3349:
1745:
1647:
775:
770:
423:
1819:
1854:
Karlgren, Jussi; Sahlgren, Magnus (2001). Uesaka, Yoshinori; Kanerva, Pentti; Asoh, Hideki (eds.). "From words to understanding".
3289:
433:
71:
2279:
Ruas, Terry; Grosky, William; Aizawa, Akiko (2019-12-01). "Multi-sense embeddings through a word sense disambiguation process".
3706:
3359:
1080:
architecture instead of more probabilistic and algebraic models, after foundational work done by Yoshua Bengio and colleagues.
1844:, Proceedings of the 22nd Annual Conference of the Cognitive Science Society, p. 1036. Mahwah, New Jersey: Erlbaum, 2000.
1458:
828:
3354:
3099:
932:
592:
413:
1980:
1498:
1116:
are not handled properly. For example, in the sentence "The club I tried yesterday was great!", it is not clear if the term
3623:
3344:
803:
505:
281:
1519:
Qureshi, M. Atif; Greene, Derek (2018-06-04). "EVE: explainable vector based embedding technique using Knowledge (XXG)".
3316:
1310:
1040:
760:
697:
607:
585:
428:
418:
3813:
3661:
3646:
3618:
3483:
3478:
3053:
2117:
953:
911:
823:
808:
269:
91:
798:
2837:
1018:, a quantitative methodological approach to understanding meaning in observed language, word embeddings or semantic
3793:
3398:
3369:
3147:
871:
548:
443:
231:
164:
124:
3241:
3094:
1841:
1164:
984:
925:
531:
299:
169:
3767:
3691:
3423:
3379:
3264:
3162:
1286:
1179:
1061:
1044:
1015:
988:
553:
473:
396:
314:
144:
106:
101:
61:
56:
3671:
3641:
3308:
2022:
Roweis, Sam T.; Saul, Lawrence K. (2000). "Nonlinear Dimensionality Reduction by Locally Linear Embedding".
1626:
Socher, Richard; Perelygin, Alex; Wu, Jean; Chuang, Jason; Manning, Chris; Ng, Andrew; Potts, Chris (2013).
1036:
500:
349:
249:
76:
1317:(t-SNE) are both used to reduce the dimensionality of word vector spaces and visualize word embeddings and
3528:
3221:
3199:
3189:
3157:
3132:
2039:
1171:
1156:
960:
is a representation of a word. The embedding is used in text analysis. Typically, the representation is a
680:
656:
558:
319:
294:
254:
66:
1261:
concept. In 2015, some researchers suggested "skip-thought vectors" as a means to improve the quality of
1235:
using logs of gameplay data. The process requires transcribing actions that occur during a game within a
3388:
1294:
1266:
1257:
The idea has been extended to embeddings of entire sentences or even documents, e.g. in the form of the
1190:
634:
456:
408:
264:
179:
51:
2405:
Li, Jiwei; Jurafsky, Dan (2015). "Do Multi-Sense Embeddings Improve Natural Language Understanding?".
1602:
3741:
3417:
3393:
3246:
2526:
2031:
1480:
Word Embedding Revisited: A New Representation Learning and Explicit Matrix Factorization Perspective
1318:
1175:
992:
973:
563:
513:
2704:
Pires, Telmo; Schlinger, Eva; Garrette, Dan (2019-06-04). "How multilingual is Multilingual BERT?".
2044:
3721:
3651:
3608:
3564:
3336:
3326:
3321:
3209:
2581:
Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment
1262:
666:
602:
573:
478:
304:
237:
223:
209:
184:
134:
86:
46:
1278:
3731:
3603:
3468:
3231:
3214:
3072:
3008:
2922:
2889:
2868:
2811:
2705:
2670:
2625:
2606:
2516:
2463:
2428:
2410:
2324:
2288:
2261:
2243:
2218:
2065:
1796:
1751:
1554:
1528:
1437:
1412:
1252:
1105:
1032:
1003:
644:
568:
354:
149:
2505:"Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics"
1170:
The use of multi-sense embeddings is known to improve performance in several NLP tasks, such as
1027:, but also has roots in the contemporaneous work on search systems and in cognitive psychology.
2761:"A visualization of evolving clinical sentiment using vector representations of clinical notes"
2003:. 13th ACM International Conference on Information and Knowledge Management. pp. 615–624.
1726:
Salton, Gerard (1962). "Some experiments in the generation of word and document associations".
3736:
3448:
3256:
3167:
3000:
2799:
2779:
2598:
2554:
2365:
2316:
2240:
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)
2199:
2167:
2057:
1959:
1921:
1741:
1546:
1391:
1232:
999:
737:
580:
493:
289:
259:
204:
199:
154:
96:
3613:
3498:
3473:
3274:
3177:
2990:
2959:
2932:
2789:
2771:
2588:
2544:
2534:
2455:
2420:
2355:
2306:
2298:
2253:
2049:
2004:
1951:
1786:
1778:
1731:
1693:
1538:
1359:
969:
765:
518:
468:
378:
362:
332:
194:
189:
139:
129:
27:
2952:"Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints"
1885:, In Proceedings of the 30th Annual Conference of the Cognitive Science Society: 1300–1305.
1728:
Proceedings of the December 4-6, 1962, fall joint computer conference on - AFIPS '62 (Fall)
3725:
3686:
3681:
3549:
3279:
3152:
3127:
3109:
2823:
2576:
1882:
1769:
Salton, Gerard; Wong, A; Yang, C S (1975). "A Vector Space Model for Automatic Indexing".
1434:
Conference of the European Chapter of the Association for Computational Linguistics (EACL)
1236:
1048:
793:
597:
463:
403:
2103:
Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics
2530:
2095:
2035:
3413:
3137:
2794:
2549:
2504:
2409:. Stroudsburg, PA, USA: Association for Computational Linguistics. pp. 1722–1732.
2242:. Stroudsburg, PA, USA: Association for Computational Linguistics. pp. 1059–1069.
1306:
1258:
1207:
1077:
965:
813:
344:
81:
2956:
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
2407:
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing
1627:
3787:
3696:
3508:
3488:
3269:
3012:
2610:
1917:
1334:
1128:
1122:
1089:
1052:
1019:
732:
661:
543:
274:
159:
2467:
2328:
2265:
1558:
3676:
2950:
Zhao, Jieyu; Wang, Tianlu; Yatskar, Mark; Ordonez, Vicente; Chang, Kai-Wei (2017).
2669:
Zhao, Jieyu; et al. (2018) (2018). "Learning Gender-Neutral Word Embeddings".
2432:
2069:
1982:
Inferring a semantic representation of text via cross-language correlation analysis
1800:
1755:
1478:
1330:
977:
2394:. Santa Fe, New Mexico, USA: Association for Computational Linguistics: 1638–1649.
2053:
2760:
2539:
3633:
3513:
3226:
3142:
3119:
3067:
2344:"Word Sense Disambiguation Studio: A Flexible System for WSD Feature Extraction"
1432:
Lebret, RĂ©mi; Collobert, Ronan (2013). "Word Emdeddings through Hellinger PCA".
1228:
1024:
961:
538:
32:
2995:
2978:
2302:
3236:
3022:
2593:
2446:
Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (June 2019).
2196:
Improving word representations via global context and multiple word prototypes
2135:
1542:
1212:
687:
383:
309:
3004:
2775:
2602:
2392:
Proceedings of the 27th International Conference on Computational Linguistics
2369:
2320:
2203:
2161:
1629:
Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
1550:
3104:
2387:
2008:
1955:
1869:
1736:
1134:
1104:
Historically, one of the main limitations of static word embeddings or word
1039:). Reducing the number of dimensions using linear algebraic methods such as
846:
627:
2963:
2803:
2558:
2459:
2424:
2105:. Proceedings of Machine Learning Research. Vol. R5. pp. 246–252.
2061:
1697:
1385:
2655:
2311:
2257:
1782:
1684:
Luhn, H.P. (1953). "A New Method of Recording and Searching Information".
3579:
3559:
3544:
3523:
3493:
3438:
3403:
3284:
2936:
2845:
2360:
2343:
1329:
For instance, the fastText is also used to calculate word embeddings for
1298:
1282:
1216:
1160:
1147:
1109:
1093:
3716:
3574:
3554:
3428:
3172:
3087:
1592:
Socher, Richard; Bauer, John; Manning, Christopher; Ng, Andrew (2013).
1152:
1113:
622:
3082:
3077:
2951:
2744:
2447:
1791:
1302:
1085:
972:
techniques, where words or phrases from the vocabulary are mapped to
373:
2739:
2927:
2894:
2873:
2710:
2675:
2630:
2521:
2415:
2293:
2248:
2223:
1575:
Linguistic Regularities in Sparse and Explicit Word Representations
1533:
3772:
3408:
2725:
1988:. Advances in Neural Information Processing Systems. Vol. 15.
1979:
Vinkourov, Alexei; Cristianini, Nello; Shawe-Taylor, John (2002).
1442:
1417:
1240:
617:
612:
339:
2979:"Word embeddings are biased. But whose bias are they reflecting?"
1661:
Firth, J.R. (1957). "A synopsis of linguistic theory 1930–1955".
1185:
As of the late 2010s, contextually-meaningful embeddings such as
3294:
1999:
Lavelli, Alberto; Sebastiani, Fabrizio; Zanoli, Roberto (2004).
1290:
1206:
grams in biological sequences (e.g. DNA, RNA, and Proteins) for
1186:
1031:
documents. The first generation of semantic space models is the
3026:
2342:
Agre, Gennady; Petrov, Daniel; Keskinova, Simona (2019-03-01).
2001:
Distributional term representations: an experimental comparison
1920:; Ducharme, RĂ©jean; Vincent, Pascal; Jauvin, Christian (2003).
3569:
2909:
Dieng, Adji B.; Ruiz, Francisco J. R.; Blei, David M. (2020).
2577:"Revealing Game Dynamics via Word Embeddings of Gameplay Data"
1051:
approach for collecting word co-occurrence contexts. In 2000,
2915:
Transactions of the Association for Computational Linguistics
1840:
Kanerva, Pentti, Kristoferson, Jan and Holst, Anders (2000):
2690:
1842:
Random Indexing of Text Samples for Latent Semantic Analysis
2096:"Hierarchical probabilistic neural network language model"
1895:
Bengio, Yoshua; RĂ©jean, Ducharme; Pascal, Vincent (2000).
1881:
Sahlgren, Magnus, Holst, Anders and Pentti Kanerva (2008)
1231:
have been proposed by Rabii and Cook as a way to discover
1277:
Software for training and using word embeddings includes
2759:
Ghassemi, Mohammad; Mark, Roger; Nemati, Shamim (2015).
2454:. Association for Computational Linguistics: 4171–4186.
1269:
with the use of siamese and triplet network structures.
905:
List of datasets in computer vision and image processing
2386:
Akbik, Alan; Blythe, Duncan; Vollgraf, Roland (2018).
1816:"The most influential paper Gerard Salton never wrote"
1460:
Neural Word Embedding as Implicit Matrix Factorization
2124:. 21 (NIPS 2008). Curran Associates, Inc.: 1081–1088.
1883:
Permutations as a Means to Encode Order in Word Space
2388:"Contextual String Embeddings for Sequence Labeling"
2118:"A Scalable Hierarchical Distributed Language Model"
1486:. Int'l J. Conf. on Artificial Intelligence (IJCAI).
1243:
that are not explicitly stated in the game's rules.
3750:
3705:
3660:
3632:
3592:
3537:
3459:
3447:
3378:
3335:
3307:
3255:
3118:
3060:
2163:
Multi-Prototype Vector-Space Models of Word Meaning
2101:. In Cowell, Robert G.; Ghahramani, Zoubin (eds.).
1711:Osgood, C.E.; Suci, G.J.; Tannenbaum, P.H. (1957).
2977:Petreski, Davor; Hashim, Ibrahim C. (2022-05-26).
2503:Asgari, Ehsaneddin; Mofrad, Mohammad R.K. (2015).
2448:"Proceedings of the 2019 Conference of the North"
2122:Advances in Neural Information Processing Systems
1151:the prior knowledge of lexical databases (e.g.,
2768:2015 Computing in Cardiology Conference (CinC)
2160:Reisinger, Joseph; Mooney, Raymond J. (2010).
900:List of datasets for machine-learning research
3038:
1950:. Vol. 194. Springer. pp. 137–186.
933:
8:
1315:T-Distributed Stochastic Neighbour Embedding
2575:Rabii, Younès; Cook, Michael (2021-10-04).
1499:"Euclidean Embedding of Co-occurrence Data"
1390:. Upper Saddle River, N.J.: Prentice Hall.
1384:Jurafsky, Daniel; H. James, Martin (2000).
3456:
3252:
3045:
3031:
3023:
1595:Parsing with compositional vector grammars
1521:Journal of Intelligent Information Systems
940:
926:
18:
2994:
2926:
2893:
2872:
2793:
2709:
2674:
2629:
2592:
2548:
2538:
2520:
2414:
2359:
2310:
2292:
2247:
2222:
2043:
1790:
1735:
1532:
1441:
1416:
1289:, GN-GloVe, Flair embeddings, AllenNLP's
1023:keeps" was proposed in a 1957 article by
983:Methods to generate this mapping include
2116:Mnih, Andriy; Hinton, Geoffrey (2009).
2094:Morin, Fredric; Bengio, Yoshua (2005).
1948:Studies in Fuzziness and Soft Computing
1922:"A Neural Probabilistic Language Model"
1897:"A Neural Probabilistic Language Model"
1671:Selected Papers of J.R. Firth 1952–1959
1376:
1010:Development and history of the approach
26:
2819:
2809:
1856:Foundations of Real-World Intelligence
2570:
2568:
2498:
2496:
2381:
2379:
1227:Word embeddings with applications in
16:Method in natural language processing
7:
3504:Simple Knowledge Organization System
2911:"Topic Modeling in Embedding Spaces"
2770:. Vol. 2015. pp. 629–632.
1929:Journal of Machine Learning Research
1648:"A brief history of word embeddings"
1506:Journal of Machine Learning Research
1198:For biological sequences: BioVectors
1174:, semantic relation identification,
2844:. Lexical Computing. Archived from
1572:Levy, Omer; Goldberg, Yoav (2014).
1457:Levy, Omer; Goldberg, Yoav (2014).
895:Glossary of artificial intelligence
1870:An Introduction to Random Indexing
1365:Distributional–relational database
1120:is related to the word sense of a
14:
3519:Thesaurus (information retrieval)
2281:Expert Systems with Applications
1601:. Proc. ACL Conf. Archived from
1043:then led to the introduction of
1715:. University of Illinois Press.
3100:Natural language understanding
1663:Studies in Linguistic Analysis
315:Relevance vector machine (RVM)
1:
3624:Optical character recognition
2054:10.1126/science.290.5500.2323
1858:. CSLI Publications: 294–308.
1477:Li, Yitan; Xu, Linli (2015).
804:Computational learning theory
368:Expectation–maximization (EM)
3317:Multi-document summarization
2540:10.1371/journal.pone.0141287
1311:Principal Component Analysis
1041:singular value decomposition
761:Coefficient of determination
608:Convolutional neural network
320:Support vector machine (SVM)
3804:Natural language processing
3647:Latent Dirichlet allocation
3619:Natural language generation
3484:Machine-readable dictionary
3479:Linguistic Linked Open Data
3054:Natural language processing
1337:that are available online.
954:natural language processing
912:Outline of machine learning
809:Empirical risk minimization
3830:
3799:Artificial neural networks
3399:Explicit semantic analysis
3148:Deep linguistic processing
2996:10.1007/s00146-022-01443-w
2303:10.1016/j.eswa.2019.06.026
1713:The Measurement of Meaning
1581:. CoNLL. pp. 171–180.
1250:
1138:, or any other sense that
1047:in the late 1980s and the
549:Feedforward neural network
300:Artificial neural networks
3809:Computational linguistics
3242:Word-sense disambiguation
3095:Computational linguistics
2594:10.1609/aiide.v17i1.18907
1771:Communications of the ACM
1669:F.R. Palmer, ed. (1968).
1543:10.1007/s10844-018-0511-x
1165:word sense disambiguation
532:Artificial neural network
3768:Natural Language Toolkit
3692:Pronunciation assessment
3594:Automatic identification
3424:Latent semantic analysis
3380:Distributional semantics
3265:Compound-term processing
3163:Named-entity recognition
2776:10.1109/CIC.2015.7410989
1868:Sahlgren, Magnus (2005)
1497:Globerson, Amir (2007).
1285:, Stanford University's
1194:BERT’s embedding space.
1182:and sentiment analysis.
1180:named entity recognition
1074:locally linear embedding
1062:self-supervised learning
1045:latent semantic analysis
1016:distributional semantics
989:dimensionality reduction
841:Journals and conferences
788:Mathematical foundations
698:Temporal difference (TD)
554:Recurrent neural network
474:Conditional random field
397:Dimensionality reduction
145:Dimensionality reduction
107:Quantum machine learning
102:Neuromorphic engineering
62:Self-supervised learning
57:Semi-supervised learning
3672:Automated essay scoring
3642:Document classification
3309:Automatic summarization
2009:10.1145/1031171.1031284
1956:10.1007/3-540-33486-6_6
1737:10.1145/1461518.1461544
1325:Examples of application
1163:), word embeddings and
1037:curse of dimensionality
250:Apprenticeship learning
3529:Universal Dependencies
3222:Terminology extraction
3205:Semantic decomposition
3200:Semantic role labeling
3190:Part-of-speech tagging
3158:Information extraction
3143:Coreference resolution
3133:Collocation extraction
2958:. pp. 2979–2989.
1698:10.1002/asi.5090040104
1686:American Documentation
1172:part-of-speech tagging
799:Bias–variance tradeoff
681:Reinforcement learning
657:Spiking neural network
67:Reinforcement learning
3290:Sentence segmentation
2194:Huang, Eric. (2012).
1814:Dubin, David (2004).
1783:10.1145/361219.361220
1100:Polysemy and homonymy
635:Neural radiance field
457:Structured prediction
180:Structured prediction
52:Unsupervised learning
3742:Voice user interface
3453:datasets and corpora
3394:Document-term matrix
3247:Word-sense induction
2964:10.18653/v1/D17-1323
2937:10.1162/tacl_a_00325
2460:10.18653/v1/N19-1423
2425:10.18653/v1/d15-1200
2361:10.3390/info10030097
1730:. pp. 234–250.
1341:Ethical implications
1202:Word embeddings for
1176:semantic relatedness
993:co-occurrence matrix
824:Statistical learning
722:Learning with humans
514:Local outlier factor
3722:Interactive fiction
3652:Pachinko allocation
3609:Speech segmentation
3565:Google Ngram Viewer
3337:Machine translation
3327:Text simplification
3322:Sentence extraction
3210:Semantic similarity
2531:2015PLoSO..1041287A
2258:10.3115/v1/d14-1113
2140:Google Code Archive
2036:2000Sci...290.2323R
1263:machine translation
1247:Sentence embeddings
1106:vector space models
1064:of word embeddings
667:Electrochemical RAM
574:reservoir computing
305:Logistic regression
224:Supervised learning
210:Multimodal learning
185:Feature engineering
130:Generative modeling
92:Rule-based learning
87:Curriculum learning
47:Supervised learning
22:Part of a series on
3814:Semantic relations
3732:Question answering
3604:Speech recognition
3469:Corpus linguistics
3449:Language resources
3232:Textual entailment
3215:Sentiment analysis
2848:on 8 February 2018
2838:"Embedding Viewer"
1822:on 18 October 2020
1673:. London: Longman.
1646:Sahlgren, Magnus.
1436:. Vol. 2014.
1253:Sentence embedding
1033:vector space model
1004:sentiment analysis
235: •
150:Density estimation
3794:Language modeling
3781:
3780:
3737:Virtual assistant
3662:Computer-assisted
3588:
3587:
3345:Computer-assisted
3303:
3302:
3295:Word segmentation
3257:Text segmentation
3195:Semantic analysis
3183:Syntactic parsing
3168:Ontology learning
2785:978-1-5090-0685-4
2173:978-1-932432-65-7
1965:978-3-540-30609-2
1397:978-0-13-095069-7
1233:emergent gameplay
1025:John Rupert Firth
1000:syntactic parsing
966:language modeling
950:
949:
755:Model diagnostics
738:Human-in-the-loop
581:Boltzmann machine
494:Anomaly detection
290:Linear regression
205:Ontology learning
200:Grammar induction
175:Semantic analysis
170:Association rules
155:Anomaly detection
97:Neuro-symbolic AI
3821:
3758:Formal semantics
3707:Natural language
3614:Speech synthesis
3596:and data capture
3499:Semantic network
3474:Lexical resource
3457:
3275:Lexical analysis
3253:
3178:Semantic parsing
3047:
3040:
3033:
3024:
3017:
3016:
2998:
2983:AI & Society
2974:
2968:
2967:
2947:
2941:
2940:
2930:
2906:
2900:
2899:
2897:
2885:
2879:
2878:
2876:
2864:
2858:
2857:
2855:
2853:
2842:Embedding Viewer
2834:
2828:
2827:
2821:
2817:
2815:
2807:
2797:
2765:
2756:
2750:
2749:
2736:
2730:
2729:
2722:
2716:
2715:
2713:
2701:
2695:
2694:
2687:
2681:
2680:
2678:
2666:
2660:
2659:
2652:
2646:
2642:
2636:
2635:
2633:
2621:
2615:
2614:
2596:
2572:
2563:
2562:
2552:
2542:
2524:
2515:(11): e0141287.
2500:
2491:
2487:
2481:
2478:
2472:
2471:
2443:
2437:
2436:
2418:
2402:
2396:
2395:
2383:
2374:
2373:
2363:
2339:
2333:
2332:
2314:
2296:
2276:
2270:
2269:
2251:
2235:
2229:
2228:
2226:
2214:
2208:
2207:
2191:
2185:
2184:
2182:
2180:
2157:
2151:
2150:
2148:
2146:
2132:
2126:
2125:
2113:
2107:
2106:
2100:
2091:
2085:
2080:
2074:
2073:
2047:
2030:(5500): 2323–6.
2019:
2013:
2012:
1996:
1990:
1989:
1987:
1976:
1970:
1969:
1943:
1937:
1936:
1926:
1914:
1908:
1907:
1901:
1892:
1886:
1879:
1873:
1866:
1860:
1859:
1851:
1845:
1838:
1832:
1831:
1829:
1827:
1818:. Archived from
1811:
1805:
1804:
1794:
1766:
1760:
1759:
1739:
1723:
1717:
1716:
1708:
1702:
1701:
1681:
1675:
1674:
1666:
1658:
1652:
1651:
1643:
1637:
1636:
1634:
1623:
1617:
1616:
1614:
1613:
1607:
1600:
1589:
1583:
1582:
1580:
1569:
1563:
1562:
1536:
1516:
1510:
1509:
1503:
1494:
1488:
1487:
1485:
1474:
1468:
1467:
1465:
1454:
1448:
1447:
1445:
1429:
1423:
1422:
1420:
1408:
1402:
1401:
1381:
1360:Brown clustering
970:feature learning
942:
935:
928:
889:Related articles
766:Confusion matrix
519:Isolation forest
464:Graphical models
243:
242:
195:Learning to rank
190:Feature learning
28:Machine learning
19:
3829:
3828:
3824:
3823:
3822:
3820:
3819:
3818:
3784:
3783:
3782:
3777:
3746:
3726:Syntax guessing
3708:
3701:
3687:Predictive text
3682:Grammar checker
3663:
3656:
3628:
3595:
3584:
3550:Bank of English
3533:
3461:
3452:
3443:
3374:
3331:
3299:
3251:
3153:Distant reading
3128:Argument mining
3114:
3110:Text processing
3056:
3051:
3021:
3020:
2976:
2975:
2971:
2949:
2948:
2944:
2908:
2907:
2903:
2887:
2886:
2882:
2866:
2865:
2861:
2851:
2849:
2836:
2835:
2831:
2818:
2808:
2786:
2763:
2758:
2757:
2753:
2738:
2737:
2733:
2724:
2723:
2719:
2703:
2702:
2698:
2689:
2688:
2684:
2668:
2667:
2663:
2654:
2653:
2649:
2643:
2639:
2623:
2622:
2618:
2574:
2573:
2566:
2502:
2501:
2494:
2488:
2484:
2479:
2475:
2445:
2444:
2440:
2404:
2403:
2399:
2385:
2384:
2377:
2341:
2340:
2336:
2278:
2277:
2273:
2237:
2236:
2232:
2216:
2215:
2211:
2193:
2192:
2188:
2178:
2176:
2174:
2159:
2158:
2154:
2144:
2142:
2134:
2133:
2129:
2115:
2114:
2110:
2098:
2093:
2092:
2088:
2083:he:יהושע ×‘× ×’'יו
2081:
2077:
2045:10.1.1.111.3313
2021:
2020:
2016:
1998:
1997:
1993:
1985:
1978:
1977:
1973:
1966:
1945:
1944:
1940:
1924:
1916:
1915:
1911:
1899:
1894:
1893:
1889:
1880:
1876:
1867:
1863:
1853:
1852:
1848:
1839:
1835:
1825:
1823:
1813:
1812:
1808:
1777:(11): 613–620.
1768:
1767:
1763:
1748:
1725:
1724:
1720:
1710:
1709:
1705:
1683:
1682:
1678:
1668:
1660:
1659:
1655:
1645:
1644:
1640:
1632:
1625:
1624:
1620:
1611:
1609:
1605:
1598:
1591:
1590:
1586:
1578:
1571:
1570:
1566:
1518:
1517:
1513:
1501:
1496:
1495:
1491:
1483:
1476:
1475:
1471:
1463:
1456:
1455:
1451:
1431:
1430:
1426:
1410:
1409:
1405:
1398:
1383:
1382:
1378:
1373:
1356:
1343:
1327:
1275:
1259:thought vectors
1255:
1249:
1237:formal language
1225:
1200:
1102:
1049:random indexing
1012:
985:neural networks
946:
917:
916:
890:
882:
881:
842:
834:
833:
794:Kernel machines
789:
781:
780:
756:
748:
747:
728:Active learning
723:
715:
714:
683:
673:
672:
598:Diffusion model
534:
524:
523:
496:
486:
485:
459:
449:
448:
404:Factor analysis
399:
389:
388:
372:
335:
325:
324:
245:
244:
228:
227:
226:
215:
214:
120:
112:
111:
77:Online learning
42:
30:
17:
12:
11:
5:
3827:
3825:
3817:
3816:
3811:
3806:
3801:
3796:
3786:
3785:
3779:
3778:
3776:
3775:
3770:
3765:
3760:
3754:
3752:
3748:
3747:
3745:
3744:
3739:
3734:
3729:
3719:
3713:
3711:
3709:user interface
3703:
3702:
3700:
3699:
3694:
3689:
3684:
3679:
3674:
3668:
3666:
3658:
3657:
3655:
3654:
3649:
3644:
3638:
3636:
3630:
3629:
3627:
3626:
3621:
3616:
3611:
3606:
3600:
3598:
3590:
3589:
3586:
3585:
3583:
3582:
3577:
3572:
3567:
3562:
3557:
3552:
3547:
3541:
3539:
3535:
3534:
3532:
3531:
3526:
3521:
3516:
3511:
3506:
3501:
3496:
3491:
3486:
3481:
3476:
3471:
3465:
3463:
3454:
3445:
3444:
3442:
3441:
3436:
3434:Word embedding
3431:
3426:
3421:
3414:Language model
3411:
3406:
3401:
3396:
3391:
3385:
3383:
3376:
3375:
3373:
3372:
3367:
3365:Transfer-based
3362:
3357:
3352:
3347:
3341:
3339:
3333:
3332:
3330:
3329:
3324:
3319:
3313:
3311:
3305:
3304:
3301:
3300:
3298:
3297:
3292:
3287:
3282:
3277:
3272:
3267:
3261:
3259:
3250:
3249:
3244:
3239:
3234:
3229:
3224:
3218:
3217:
3212:
3207:
3202:
3197:
3192:
3187:
3186:
3185:
3180:
3170:
3165:
3160:
3155:
3150:
3145:
3140:
3138:Concept mining
3135:
3130:
3124:
3122:
3116:
3115:
3113:
3112:
3107:
3102:
3097:
3092:
3091:
3090:
3085:
3075:
3070:
3064:
3062:
3058:
3057:
3052:
3050:
3049:
3042:
3035:
3027:
3019:
3018:
2989:(2): 975–982.
2969:
2942:
2901:
2880:
2859:
2829:
2820:|journal=
2784:
2751:
2731:
2717:
2696:
2682:
2661:
2647:
2637:
2616:
2587:(1): 187–194.
2564:
2492:
2482:
2473:
2438:
2397:
2375:
2334:
2312:2027.42/145475
2271:
2230:
2209:
2186:
2172:
2152:
2127:
2108:
2086:
2075:
2014:
1991:
1971:
1964:
1938:
1918:Bengio, Yoshua
1909:
1887:
1874:
1861:
1846:
1833:
1806:
1761:
1746:
1718:
1703:
1676:
1653:
1638:
1618:
1584:
1564:
1511:
1489:
1469:
1449:
1424:
1403:
1396:
1375:
1374:
1372:
1369:
1368:
1367:
1362:
1355:
1352:
1342:
1339:
1326:
1323:
1307:Deeplearning4j
1274:
1271:
1251:Main article:
1248:
1245:
1224:
1221:
1208:bioinformatics
1199:
1196:
1101:
1098:
1078:neural network
1011:
1008:
958:word embedding
948:
947:
945:
944:
937:
930:
922:
919:
918:
915:
914:
909:
908:
907:
897:
891:
888:
887:
884:
883:
880:
879:
874:
869:
864:
859:
854:
849:
843:
840:
839:
836:
835:
832:
831:
826:
821:
816:
814:Occam learning
811:
806:
801:
796:
790:
787:
786:
783:
782:
779:
778:
773:
771:Learning curve
768:
763:
757:
754:
753:
750:
749:
746:
745:
740:
735:
730:
724:
721:
720:
717:
716:
713:
712:
711:
710:
700:
695:
690:
684:
679:
678:
675:
674:
671:
670:
664:
659:
654:
649:
648:
647:
637:
632:
631:
630:
625:
620:
615:
605:
600:
595:
590:
589:
588:
578:
577:
576:
571:
566:
561:
551:
546:
541:
535:
530:
529:
526:
525:
522:
521:
516:
511:
503:
497:
492:
491:
488:
487:
484:
483:
482:
481:
476:
471:
460:
455:
454:
451:
450:
447:
446:
441:
436:
431:
426:
421:
416:
411:
406:
400:
395:
394:
391:
390:
387:
386:
381:
376:
370:
365:
360:
352:
347:
342:
336:
331:
330:
327:
326:
323:
322:
317:
312:
307:
302:
297:
292:
287:
279:
278:
277:
272:
267:
257:
255:Decision trees
252:
246:
232:classification
222:
221:
220:
217:
216:
213:
212:
207:
202:
197:
192:
187:
182:
177:
172:
167:
162:
157:
152:
147:
142:
137:
132:
127:
125:Classification
121:
118:
117:
114:
113:
110:
109:
104:
99:
94:
89:
84:
82:Batch learning
79:
74:
69:
64:
59:
54:
49:
43:
40:
39:
36:
35:
24:
23:
15:
13:
10:
9:
6:
4:
3:
2:
3826:
3815:
3812:
3810:
3807:
3805:
3802:
3800:
3797:
3795:
3792:
3791:
3789:
3774:
3771:
3769:
3766:
3764:
3763:Hallucination
3761:
3759:
3756:
3755:
3753:
3749:
3743:
3740:
3738:
3735:
3733:
3730:
3727:
3723:
3720:
3718:
3715:
3714:
3712:
3710:
3704:
3698:
3697:Spell checker
3695:
3693:
3690:
3688:
3685:
3683:
3680:
3678:
3675:
3673:
3670:
3669:
3667:
3665:
3659:
3653:
3650:
3648:
3645:
3643:
3640:
3639:
3637:
3635:
3631:
3625:
3622:
3620:
3617:
3615:
3612:
3610:
3607:
3605:
3602:
3601:
3599:
3597:
3591:
3581:
3578:
3576:
3573:
3571:
3568:
3566:
3563:
3561:
3558:
3556:
3553:
3551:
3548:
3546:
3543:
3542:
3540:
3536:
3530:
3527:
3525:
3522:
3520:
3517:
3515:
3512:
3510:
3509:Speech corpus
3507:
3505:
3502:
3500:
3497:
3495:
3492:
3490:
3489:Parallel text
3487:
3485:
3482:
3480:
3477:
3475:
3472:
3470:
3467:
3466:
3464:
3458:
3455:
3450:
3446:
3440:
3437:
3435:
3432:
3430:
3427:
3425:
3422:
3419:
3415:
3412:
3410:
3407:
3405:
3402:
3400:
3397:
3395:
3392:
3390:
3387:
3386:
3384:
3381:
3377:
3371:
3368:
3366:
3363:
3361:
3358:
3356:
3353:
3351:
3350:Example-based
3348:
3346:
3343:
3342:
3340:
3338:
3334:
3328:
3325:
3323:
3320:
3318:
3315:
3314:
3312:
3310:
3306:
3296:
3293:
3291:
3288:
3286:
3283:
3281:
3280:Text chunking
3278:
3276:
3273:
3271:
3270:Lemmatisation
3268:
3266:
3263:
3262:
3260:
3258:
3254:
3248:
3245:
3243:
3240:
3238:
3235:
3233:
3230:
3228:
3225:
3223:
3220:
3219:
3216:
3213:
3211:
3208:
3206:
3203:
3201:
3198:
3196:
3193:
3191:
3188:
3184:
3181:
3179:
3176:
3175:
3174:
3171:
3169:
3166:
3164:
3161:
3159:
3156:
3154:
3151:
3149:
3146:
3144:
3141:
3139:
3136:
3134:
3131:
3129:
3126:
3125:
3123:
3121:
3120:Text analysis
3117:
3111:
3108:
3106:
3103:
3101:
3098:
3096:
3093:
3089:
3086:
3084:
3081:
3080:
3079:
3076:
3074:
3071:
3069:
3066:
3065:
3063:
3061:General terms
3059:
3055:
3048:
3043:
3041:
3036:
3034:
3029:
3028:
3025:
3014:
3010:
3006:
3002:
2997:
2992:
2988:
2984:
2980:
2973:
2970:
2965:
2961:
2957:
2953:
2946:
2943:
2938:
2934:
2929:
2924:
2920:
2916:
2912:
2905:
2902:
2896:
2891:
2884:
2881:
2875:
2870:
2863:
2860:
2847:
2843:
2839:
2833:
2830:
2825:
2813:
2805:
2801:
2796:
2791:
2787:
2781:
2777:
2773:
2769:
2762:
2755:
2752:
2748:. 2018-10-25.
2747:
2746:
2741:
2735:
2732:
2727:
2721:
2718:
2712:
2707:
2700:
2697:
2692:
2686:
2683:
2677:
2672:
2665:
2662:
2657:
2651:
2648:
2641:
2638:
2632:
2627:
2620:
2617:
2612:
2608:
2604:
2600:
2595:
2590:
2586:
2582:
2578:
2571:
2569:
2565:
2560:
2556:
2551:
2546:
2541:
2536:
2532:
2528:
2523:
2518:
2514:
2510:
2506:
2499:
2497:
2493:
2486:
2483:
2477:
2474:
2469:
2465:
2461:
2457:
2453:
2449:
2442:
2439:
2434:
2430:
2426:
2422:
2417:
2412:
2408:
2401:
2398:
2393:
2389:
2382:
2380:
2376:
2371:
2367:
2362:
2357:
2353:
2349:
2345:
2338:
2335:
2330:
2326:
2322:
2318:
2313:
2308:
2304:
2300:
2295:
2290:
2286:
2282:
2275:
2272:
2267:
2263:
2259:
2255:
2250:
2245:
2241:
2234:
2231:
2225:
2220:
2213:
2210:
2205:
2201:
2197:
2190:
2187:
2175:
2169:
2165:
2164:
2156:
2153:
2141:
2137:
2131:
2128:
2123:
2119:
2112:
2109:
2104:
2097:
2090:
2087:
2084:
2079:
2076:
2071:
2067:
2063:
2059:
2055:
2051:
2046:
2041:
2037:
2033:
2029:
2025:
2018:
2015:
2010:
2006:
2002:
1995:
1992:
1984:
1983:
1975:
1972:
1967:
1961:
1957:
1953:
1949:
1942:
1939:
1934:
1930:
1923:
1919:
1913:
1910:
1905:
1898:
1891:
1888:
1884:
1878:
1875:
1871:
1865:
1862:
1857:
1850:
1847:
1843:
1837:
1834:
1821:
1817:
1810:
1807:
1802:
1798:
1793:
1788:
1784:
1780:
1776:
1772:
1765:
1762:
1757:
1753:
1749:
1747:9781450378796
1743:
1738:
1733:
1729:
1722:
1719:
1714:
1707:
1704:
1699:
1695:
1691:
1687:
1680:
1677:
1672:
1667:Reprinted in
1664:
1657:
1654:
1649:
1642:
1639:
1631:
1630:
1622:
1619:
1608:on 2016-08-11
1604:
1597:
1596:
1588:
1585:
1577:
1576:
1568:
1565:
1560:
1556:
1552:
1548:
1544:
1540:
1535:
1530:
1526:
1522:
1515:
1512:
1507:
1500:
1493:
1490:
1482:
1481:
1473:
1470:
1462:
1461:
1453:
1450:
1444:
1439:
1435:
1428:
1425:
1419:
1414:
1407:
1404:
1399:
1393:
1389:
1388:
1380:
1377:
1370:
1366:
1363:
1361:
1358:
1357:
1353:
1351:
1347:
1340:
1338:
1336:
1335:Sketch Engine
1332:
1324:
1322:
1320:
1316:
1312:
1308:
1305:, Indra, and
1304:
1300:
1296:
1292:
1288:
1284:
1280:
1279:Tomáš Mikolov
1272:
1270:
1268:
1264:
1260:
1254:
1246:
1244:
1242:
1238:
1234:
1230:
1222:
1220:
1218:
1214:
1209:
1205:
1197:
1195:
1192:
1188:
1183:
1181:
1177:
1173:
1168:
1166:
1162:
1158:
1154:
1149:
1144:
1141:
1137:
1136:
1131:
1130:
1125:
1124:
1123:club sandwich
1119:
1115:
1111:
1107:
1099:
1097:
1095:
1091:
1090:Tomas Mikolov
1087:
1081:
1079:
1075:
1071:
1065:
1063:
1057:
1054:
1050:
1046:
1042:
1038:
1034:
1028:
1026:
1021:
1020:feature space
1017:
1009:
1007:
1005:
1001:
996:
994:
990:
986:
981:
979:
975:
971:
967:
963:
959:
955:
943:
938:
936:
931:
929:
924:
923:
921:
920:
913:
910:
906:
903:
902:
901:
898:
896:
893:
892:
886:
885:
878:
875:
873:
870:
868:
865:
863:
860:
858:
855:
853:
850:
848:
845:
844:
838:
837:
830:
827:
825:
822:
820:
817:
815:
812:
810:
807:
805:
802:
800:
797:
795:
792:
791:
785:
784:
777:
774:
772:
769:
767:
764:
762:
759:
758:
752:
751:
744:
741:
739:
736:
734:
733:Crowdsourcing
731:
729:
726:
725:
719:
718:
709:
706:
705:
704:
701:
699:
696:
694:
691:
689:
686:
685:
682:
677:
676:
668:
665:
663:
662:Memtransistor
660:
658:
655:
653:
650:
646:
643:
642:
641:
638:
636:
633:
629:
626:
624:
621:
619:
616:
614:
611:
610:
609:
606:
604:
601:
599:
596:
594:
591:
587:
584:
583:
582:
579:
575:
572:
570:
567:
565:
562:
560:
557:
556:
555:
552:
550:
547:
545:
544:Deep learning
542:
540:
537:
536:
533:
528:
527:
520:
517:
515:
512:
510:
508:
504:
502:
499:
498:
495:
490:
489:
480:
479:Hidden Markov
477:
475:
472:
470:
467:
466:
465:
462:
461:
458:
453:
452:
445:
442:
440:
437:
435:
432:
430:
427:
425:
422:
420:
417:
415:
412:
410:
407:
405:
402:
401:
398:
393:
392:
385:
382:
380:
377:
375:
371:
369:
366:
364:
361:
359:
357:
353:
351:
348:
346:
343:
341:
338:
337:
334:
329:
328:
321:
318:
316:
313:
311:
308:
306:
303:
301:
298:
296:
293:
291:
288:
286:
284:
280:
276:
275:Random forest
273:
271:
268:
266:
263:
262:
261:
258:
256:
253:
251:
248:
247:
240:
239:
234:
233:
225:
219:
218:
211:
208:
206:
203:
201:
198:
196:
193:
191:
188:
186:
183:
181:
178:
176:
173:
171:
168:
166:
163:
161:
160:Data cleaning
158:
156:
153:
151:
148:
146:
143:
141:
138:
136:
133:
131:
128:
126:
123:
122:
116:
115:
108:
105:
103:
100:
98:
95:
93:
90:
88:
85:
83:
80:
78:
75:
73:
72:Meta-learning
70:
68:
65:
63:
60:
58:
55:
53:
50:
48:
45:
44:
38:
37:
34:
29:
25:
21:
20:
3677:Concordancer
3433:
3073:Bag-of-words
2986:
2982:
2972:
2955:
2945:
2918:
2914:
2904:
2883:
2862:
2850:. Retrieved
2846:the original
2841:
2832:
2767:
2754:
2743:
2734:
2720:
2699:
2685:
2664:
2650:
2640:
2619:
2584:
2580:
2512:
2508:
2485:
2476:
2451:
2441:
2406:
2400:
2391:
2351:
2347:
2337:
2284:
2280:
2274:
2239:
2233:
2212:
2195:
2189:
2177:. Retrieved
2162:
2155:
2143:. Retrieved
2139:
2130:
2121:
2111:
2102:
2089:
2078:
2027:
2023:
2017:
2000:
1994:
1981:
1974:
1947:
1941:
1935:: 1137–1155.
1932:
1928:
1912:
1903:
1890:
1877:
1864:
1855:
1849:
1836:
1824:. Retrieved
1820:the original
1809:
1774:
1770:
1764:
1727:
1721:
1712:
1706:
1689:
1685:
1679:
1670:
1662:
1656:
1641:
1628:
1621:
1610:. Retrieved
1603:the original
1594:
1587:
1574:
1567:
1524:
1520:
1514:
1505:
1492:
1479:
1472:
1459:
1452:
1433:
1427:
1406:
1386:
1379:
1348:
1344:
1331:text corpora
1328:
1276:
1256:
1226:
1211:learning in
1203:
1201:
1184:
1169:
1145:
1139:
1133:
1127:
1121:
1117:
1103:
1082:
1072:how to use "
1069:
1066:
1058:
1029:
1013:
997:
991:on the word
982:
978:real numbers
957:
951:
819:PAC learning
506:
355:
350:Hierarchical
282:
236:
230:
3634:Topic model
3514:Text corpus
3360:Statistical
3227:Text mining
3068:AI-complete
2921:: 439–453.
2348:Information
2287:: 288–303.
2179:October 25,
1527:: 137–165.
1229:game design
1223:Game design
962:real-valued
703:Multi-agent
640:Transformer
539:Autoencoder
295:Naive Bayes
33:data mining
3788:Categories
3355:Rule-based
3237:Truecasing
3105:Stop words
2928:1907.04907
2895:1607.06520
2874:1607.06520
2711:1906.01502
2676:1809.01496
2631:1506.06726
2522:1503.05140
2416:1506.01070
2294:2101.08700
2249:1504.06654
2224:1805.04032
2136:"word2vec"
1826:18 October
1612:2014-08-14
1534:1702.06891
1371:References
1313:(PCA) and
1213:proteomics
1157:ConceptNet
688:Q-learning
586:Restricted
384:Mean shift
333:Clustering
310:Perceptron
238:regression
140:Clustering
135:Regression
3664:reviewing
3462:standards
3460:Types and
3013:249112516
3005:1435-5655
2822:ignored (
2812:cite book
2611:248175634
2603:2334-0924
2370:2078-2489
2354:(3): 97.
2321:0957-4174
2204:857900050
2040:CiteSeerX
1792:1813/6057
1692:: 14–16.
1551:0925-9902
1443:1312.5542
1418:1310.4546
1135:golf club
1129:clubhouse
956:(NLP), a
847:ECML PKDD
829:VC theory
776:ROC curve
708:Self-play
628:DeepDream
469:Bayes net
260:Ensembles
41:Paradigms
3580:Wikidata
3560:FrameNet
3545:BabelNet
3524:Treebank
3494:PropBank
3439:Word2vec
3404:fastText
3285:Stemming
2804:27774487
2726:"Gensim"
2559:26555596
2509:PLOS ONE
2468:52967399
2329:52225306
2266:15251438
2062:11125150
1635:. EMNLP.
1559:10656055
1354:See also
1319:clusters
1299:fastText
1283:Word2vec
1273:Software
1217:genomics
1161:BabelNet
1148:word2vec
1114:homonymy
1110:polysemy
1094:word2vec
1092:created
270:Boosting
119:Problems
3751:Related
3717:Chatbot
3575:WordNet
3555:DBpedia
3429:Seq2seq
3173:Parsing
3088:Trigram
2795:5070922
2740:"Indra"
2656:"GloVe"
2550:4640716
2527:Bibcode
2490:(2019).
2433:6222768
2145:23 July
2070:5987139
2032:Bibcode
2024:Science
1904:NeurIPS
1801:6473756
1756:9937095
1665:: 1–32.
1466:. NIPS.
1153:WordNet
1088:led by
1070:Science
974:vectors
852:NeurIPS
669:(ECRAM)
623:AlexNet
265:Bagging
3724:(c.f.
3382:models
3370:Neural
3083:Bigram
3078:n-gram
3011:
3003:
2802:
2792:
2782:
2745:GitHub
2691:"Elmo"
2609:
2601:
2557:
2547:
2466:
2431:
2368:
2327:
2319:
2264:
2202:
2170:
2068:
2060:
2042:
1962:
1799:
1754:
1744:
1557:
1549:
1394:
1303:Gensim
1086:Google
1053:Bengio
645:Vision
501:RANSAC
379:OPTICS
374:DBSCAN
358:-means
165:AutoML
3773:spaCy
3418:large
3409:GloVe
3009:S2CID
2923:arXiv
2890:arXiv
2869:arXiv
2852:7 Feb
2764:(PDF)
2706:arXiv
2671:arXiv
2645:2019.
2626:arXiv
2607:S2CID
2517:arXiv
2464:S2CID
2429:S2CID
2411:arXiv
2325:S2CID
2289:arXiv
2262:S2CID
2244:arXiv
2219:arXiv
2099:(PDF)
2066:S2CID
1986:(PDF)
1925:(PDF)
1900:(PDF)
1797:S2CID
1752:S2CID
1633:(PDF)
1606:(PDF)
1599:(PDF)
1579:(PDF)
1555:S2CID
1529:arXiv
1502:(PDF)
1484:(PDF)
1464:(PDF)
1438:arXiv
1413:arXiv
1287:GloVe
1241:chess
867:IJCAI
693:SARSA
652:Mamba
618:LeNet
613:U-Net
439:t-SNE
363:Fuzzy
340:BIRCH
3538:Data
3389:BERT
3001:ISSN
2854:2018
2824:help
2800:PMID
2780:ISBN
2599:ISSN
2555:PMID
2366:ISSN
2317:ISSN
2200:OCLC
2181:2019
2168:ISBN
2147:2021
2058:PMID
1960:ISBN
1828:2020
1742:ISBN
1547:ISSN
1392:ISBN
1295:BERT
1291:ELMo
1267:BERT
1215:and
1191:BERT
1189:and
1187:ELMo
1140:club
1118:club
1112:and
1002:and
968:and
877:JMLR
862:ICLR
857:ICML
743:RLHF
559:LSTM
345:CURE
31:and
3570:UBY
2991:doi
2960:doi
2933:doi
2790:PMC
2772:doi
2589:doi
2545:PMC
2535:doi
2456:doi
2421:doi
2356:doi
2307:hdl
2299:doi
2285:136
2254:doi
2050:doi
2028:290
2005:doi
1952:doi
1787:hdl
1779:doi
1732:doi
1694:doi
1539:doi
1333:in
1281:'s
1068:in
1014:In
976:of
952:In
603:SOM
593:GAN
569:ESN
564:GRU
509:-NN
444:SDL
434:PGD
429:PCA
424:NMF
419:LDA
414:ICA
409:CCA
285:-NN
3790::
3007:.
2999:.
2987:38
2985:.
2981:.
2954:.
2931:.
2917:.
2913:.
2840:.
2816::
2814:}}
2810:{{
2798:.
2788:.
2778:.
2766:.
2742:.
2605:.
2597:.
2585:17
2583:.
2579:.
2567:^
2553:.
2543:.
2533:.
2525:.
2513:10
2511:.
2507:.
2495:^
2462:.
2450:.
2427:.
2419:.
2390:.
2378:^
2364:.
2352:10
2350:.
2346:.
2323:.
2315:.
2305:.
2297:.
2283:.
2260:.
2252:.
2198:.
2138:.
2120:.
2064:.
2056:.
2048:.
2038:.
2026:.
1958:.
1931:.
1927:.
1902:.
1795:.
1785:.
1775:18
1773:.
1750:.
1740:.
1688:.
1553:.
1545:.
1537:.
1525:53
1523:.
1504:.
1321:.
1309:.
1301:,
1297:,
1293:,
1204:n-
1178:,
1159:,
1155:,
1132:,
1126:,
1006:.
987:,
980:.
872:ML
3728:)
3451:,
3420:)
3416:(
3046:e
3039:t
3032:v
3015:.
2993::
2966:.
2962::
2939:.
2935::
2925::
2919:8
2898:.
2892::
2877:.
2871::
2856:.
2826:)
2806:.
2774::
2728:.
2714:.
2708::
2693:.
2679:.
2673::
2658:.
2634:.
2628::
2613:.
2591::
2561:.
2537::
2529::
2519::
2470:.
2458::
2435:.
2423::
2413::
2372:.
2358::
2331:.
2309::
2301::
2291::
2268:.
2256::
2246::
2227:.
2221::
2206:.
2183:.
2149:.
2072:.
2052::
2034::
2011:.
2007::
1968:.
1954::
1933:3
1906:.
1830:.
1803:.
1789::
1781::
1758:.
1734::
1700:.
1696::
1690:4
1650:.
1615:.
1561:.
1541::
1531::
1508:.
1446:.
1440::
1421:.
1415::
1400:.
941:e
934:t
927:v
507:k
356:k
283:k
241:)
229:(
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.