Knowledge

Word2vec

Source 📝

1298: 2684:). The space of documents is then scanned using HDBSCAN, and clusters of similar documents are found. Next, the centroid of documents identified in a cluster is considered to be that cluster's topic vector. Finally, top2vec searches the semantic space for word embeddings located near to the topic vector to ascertain the 'meaning' of the topic. The word with embeddings most similar to the topic vector might be assigned as the topic's title, whereas far away word embeddings may be considered unrelated. 2727:) to refer to biological sequences in general with protein-vectors (ProtVec) for proteins (amino-acid sequences) and gene-vectors (GeneVec) for gene sequences, this representation can be widely used in applications of machine learning in proteomics and genomics. The results suggest that BioVectors can characterize biological sequences in terms of biochemical and biophysical interpretations of the underlying patterns. A similar variant, dna2vec, has shown that there is correlation between 2794: 2847:(LSA), when it is trained with medium to large corpus size (more than 10 million words). However, with a small training corpus, LSA showed better performance. Additionally they show that the best parameter setting depends on the task and the training corpus. Nevertheless, for skip-gram models trained in medium size corpora, with 50 dimensions, a window size of 15 and 10 negative samples seems to be a good parameter setting. 2748:(OOV) words and morphologically similar words. If the Word2vec model has not encountered a particular word before, it will be forced to use a random vector, which is generally far from its ideal representation. This can particularly be an issue in domains like medicine where synonyms and related words can be used depending on the preferred style of radiologist, and words may have been used infrequently in a large corpus. 5586: 5566: 2802:
algebraic operations on the vector representations of these words such that the vector representation of "Brother" - "Man" + "Woman" produces a result which is closest to the vector representation of "Sister" in the model. Such relationships can be generated for a range of semantic relations (such as Country–Capital) as well as syntactic relations (e.g. present tense–past tense).
2663:(PV-DM), is identical to CBOW other than it also provides a unique document identifier as a piece of additional context. The second architecture, Distributed Bag of Words version of Paragraph Vector (PV-DBOW), is identical to the skip-gram model except that it attempts to predict the window of surrounding context words from the paragraph identifier instead of the current word. 2013: 2819:
When assessing the quality of a vector model, a user may draw on this accuracy test which is implemented in word2vec, or develop their own test set which is meaningful to the corpora which make up the model. This approach offers a more challenging test than simply arguing that the words most similar to a given test word are intuitively plausible.
3145: 2831:
In models using large corpora and a high number of dimensions, the skip-gram model yields the highest overall accuracy, and consistently produces the highest accuracy on semantic relationships, as well as yielding the highest syntactic accuracy in most cases. However, the CBOW is less computationally
2818:
Mikolov et al. (2013) developed an approach to assessing the quality of a word2vec model which draws on the semantic and syntactic patterns discussed above. They developed a set of 8,869 semantic relations and 10,675 syntactic relations which they use as a benchmark to test the accuracy of a model.
2780:
Levy et al. (2015) show that much of the superior performance of word2vec or similar embeddings in downstream tasks is not a result of the models per se, but of the choice of specific hyperparameters. Transferring these hyperparameters to more 'traditional' approaches yields similar performances in
2666:
doc2vec also has the ability to capture the semantic ‘meanings’ for additional pieces of  ‘context’ around words; doc2vec can estimate the semantic embeddings for speakers or speaker attributes, groups, and periods of time. For example, doc2vec has been used to estimate the political positions
2662:
doc2vec estimates the distributed representations of documents much like how word2vec estimates representations of words: doc2vec utilizes either of two model architectures, both of which are allegories to the architectures used in word2vec. The first, Distributed Memory Model of Paragraph Vectors
2755:
from clinical texts, which include ambiguity of free text narrative style, lexical variations, use of ungrammatical and telegraphic phases, arbitrary ordering of words, and frequent appearance of abbreviations and acronyms. Of particular interest, the IWE model (trained on the one institutional
2003:
The quantity on the left is fast to compute, but the quantity on the right is slow, as it involves summing over the entire vocabulary set for each word in the corpus. Furthermore, to use gradient ascent to maximize the log-probability requires computing the gradient of the quantity on the right,
1148:
In the continuous skip-gram architecture, the model uses the current word to predict the surrounding window of context words. The skip-gram architecture weighs nearby context words more heavily than more distant context words. According to the authors' note, CBOW is faster while skip-gram does a
2801:
The word embedding approach is able to capture multiple different degrees of similarity between words. Mikolov et al. (2013) found that semantic and syntactic patterns can be reproduced using vector arithmetic. Patterns such as "Man is to Woman as Brother is to Sister" can be generated through
1144:
The CBOW can be viewed as a ‘fill in the blank’ task, where the word embedding represents the way the word influences the relative probabilities of other words in the context window. Words which are semantically similar should influence these probabilities in similar ways, because semantically
2827:
The use of different model parameters and different corpus sizes can greatly affect the quality of a word2vec model. Accuracy can be improved in a number of ways, including the choice of model architecture (CBOW or Skip-Gram), increasing the training data set, increasing the number of vector
2805:
This facet of word2vec has been exploited in a variety of other contexts. For example, word2vec has been used to map a vector space of words in one language to a vector space constructed from another language. Relationships between translated words in both spaces can be used to assist with
2595:
of sampled negative instances. According to the authors, hierarchical softmax works better for infrequent words while negative sampling works better for frequent words and better with low dimensional vectors. As training epochs increase, hierarchical softmax stops being useful.
2499: 2001: 2835:
Overall, accuracy increases with the number of words used and the number of dimensions. Mikolov et al. report that doubling the amount of training data results in an increase in computational complexity equivalent to doubling the number of vector dimensions.
2691:, top2vec provides canonical ‘distance’ metrics between two topics, or between a topic and another embeddings (word, document, or otherwise). Together with results from HDBSCAN, users can generate topic hierarchies, or groups of related topics and subtopics. 1841: 2543:
was developed by a team at Stanford specifically as a competitor, and the original paper noted multiple improvements of GloVe over word2vec. Mikolov argued that the comparison was unfair as GloVe was trained on more data, and that the
2622:
The size of the context window determines how many words before and after a given word are included as context words of the given word. According to the authors' note, the recommended value is 10 for skip-gram and 5 for CBOW.
3378: 2342: 2694:
Furthermore, a user can use the results of top2vec to infer the topics of out-of-sample documents. After inferring the embedding for a new document, must only search the space of topics for the closest topic vector.
2240: 2743:
An extension of word vectors for creating a dense vector representation of unstructured radiology reports has been proposed by Banerjee et al. One of the biggest challenges with Word2vec is how to handle unknown or
2828:
dimensions, and increasing the window size of words considered by the algorithm. Each of these improvements comes with the cost of increased computational complexity and therefore increased model generation time.
1281:
The idea of skip-gram is that the vector of a word should be close to the vector of each of its neighbors. The idea of CBOW is that the vector-sum of a word's neighbors should be close to the vector of the word.
2768:
learning in the word2vec framework are poorly understood. Goldberg and Levy point out that the word2vec objective function causes words that occur in similar contexts to have similar embeddings (as measured by
2675:
Another extension of word2vec is top2vec, which leverages both document and word embeddings to estimate distributed representations of topics. top2vec takes document embeddings learned from a doc2vec model and
1846: 1058:
representations of words. These vectors capture information about the meaning of the word based on the surrounding words. The word2vec algorithm estimates these representations by modeling text in a large
2604:
High-frequency and low-frequency words often provide little information. Words with a frequency above a certain threshold, or below a certain threshold, may be subsampled or removed to speed up training.
2613:
Quality of word embedding increases with higher dimensionality. But after reaching some point, marginal gain diminishes. Typically, the dimensionality of the vectors is set to be between 100 and 1,000.
2337: 1589: 2106: 1492: 3140:, Mikolov, Tomas; Chen, Kai & Corrado, Gregory S. et al., "Computing numeric representations of words in a high-dimensional space", published 2015-05-19, assigned to 2108:
That is, we want to maximize the total probability for the corpus, as seen by a probability model that uses words to predict its word neighbors. We predict each word-neighbor independently, thus
2785:
for text, which involves a random walk generation process based upon loglinear topic model. They use this to explain some properties of word embeddings, including their use to solve analogies.
5460: 1401: 961: 1705: 3815:
Altszyler, E.; Ribeiro, S.; Sigman, M.; Fernández Slezak, D. (2017). "The interpretation of dream meaning: Resolving ambiguity using Latent Semantic Analysis in a small corpus of text".
999: 3193:"We use our insights to construct a new model for word representation which we call GloVe, for Global Vectors, because the global corpus statistics are captured directly by the model." 1250: 1710: 3163: 3947: 1643: 4107: 956: 1152:
After the model has trained, the learned word embeddings are positioned in the vector space such that words that share common contexts in the corpus — that is, words that are
946: 5302: 787: 4708: 3002: 1141:(CBOW) or continuously sliding skip-gram. In both architectures, word2vec considers both individual words and a sliding context window as it iterates over the corpus. 1276: 2563:, which add multiple neural-network attention layers on top of a word embedding model similar to Word2vec, have come to be regarded as the state of the art in NLP. 1082:
of numbers which capture relationships between words. In particular, words which appear in similar contexts are mapped to vectors which are nearby as measured by
994: 5627: 4085: 2681: 2529: 1208: 1188: 1707:, then take the dot-product-softmax with every other vector sum (this step is similar to the attention mechanism in Transformers), to obtain the probability: 951: 802: 533: 3114:
Mikolov, Tomáš; Karafiát, Martin; Burget, Lukáš; Černocký, Jan; Khudanpur, Sanjeev (26 September 2010). "Recurrent neural network based language model".
1034: 837: 4496: 3940: 2756:
dataset) successfully translated to a different institutional dataset which demonstrates good generalizability of the approach across institutions.
2552: 1156:
and syntactically similar — are located close to one another in the space. More dissimilar words are located farther from one another in the space.
4818: 4665: 2839:
Altszyler and coauthors (2017) studied Word2vec performance in two semantic tests for different corpus size. They found that Word2vec has a steep
913: 462: 2248: 1500: 4701: 2111: 3905: 3503: 2023: 1494:
That is, we want to maximize the total probability for the corpus, as seen by a probability model that uses word neighbors to predict words.
1409: 5491: 4406: 4097: 3933: 2959:
Mikolov, Tomas; Chen, Kai; Corrado, Greg; Dean, Jeffrey (16 January 2013). "Efficient Estimation of Word Representations in Vector Space".
2535:
Embedding vectors created using the Word2vec algorithm have some advantages compared to earlier algorithms such as those using n-grams and
971: 734: 269: 2587:
and/or negative sampling. To approximate the conditional log-likelihood a model seeks to maximize, the hierarchical softmax method uses a
5632: 5592: 5143: 4880: 4660: 989: 1145:
similar words should be used in similar contexts. The order of context words does not influence prediction (bag of words assumption).
4267: 3203:
Joulin, Armand; Grave, Edouard; Bojanowski, Piotr; Mikolov, Tomas (9 August 2016). "Bag of Tricks for Efficient Text Classification".
2667:
of political parties in various Congresses and Parliaments in the U.S. and U.K., respectively, and various governmental institutions.
822: 797: 746: 2532:
2013. It also took months for the code to be approved for open-sourcing. Other researchers helped analyse and explain the algorithm.
5404: 5031: 4838: 4694: 4421: 4252: 2980: 2840: 870: 865: 518: 2494:{\displaystyle \sum _{i\in C,j\in N+i}\left(v_{w_{i}}\cdot v_{w_{j}}-\ln \sum _{w\in V}e^{v_{w}\cdot v_{w_{\color {red}i}}}\right)} 1305:
Suppose we want each word in the corpus to be predicted by every other word in a small span of 4 words. We write the neighbor set
5359: 4192: 3900: 528: 166: 2659:
tools (see below), with the Java and Python versions also supporting inference of document embeddings on new, unseen documents.
4609: 4262: 2917: 2728: 2591:
to reduce calculation. The negative sampling method, on the other hand, approaches the maximization problem by minimizing the
923: 5622: 5546: 5486: 5084: 4257: 4002: 2777:. However, they note that this explanation is "very hand-wavy" and argue that a more formal explanation would be preferable. 1027: 687: 508: 3069:
Goldberg, Yoav; Levy, Omer (2014). "word2vec Explained: Deriving Mikolov et al.'s Negative-Sampling Word-Embedding Method".
1167:
A corpus is a sequence of words. Both CBOW and skip-gram are methods to learn one vector per word appearing in the corpus.
5079: 4768: 4526: 4247: 3737:
Mikolov, Tomas; Yih, Wen-tau; Zweig, Geoffrey (2013). "Linguistic Regularities in Continuous Space Word Representations".
2648: 2514: 898: 600: 376: 5521: 4875: 4828: 4823: 4219: 3186: 2656: 855: 792: 702: 680: 523: 513: 66: 1996:{\displaystyle \sum _{i\in C,j\in N+i}\left(v_{w_{i}}\cdot v_{w_{j}}-\ln \sum _{w\in V}e^{v_{w}\cdot v_{w_{j}}}\right)} 5637: 5572: 4868: 4794: 4564: 4549: 4521: 4386: 4381: 3956: 2688: 2652: 1051: 1006: 918: 903: 364: 186: 3167: 893: 5196: 5131: 4732: 4301: 4272: 4050: 2781:
downstream tasks. Arora et al. (2016) explain word2vec and related algorithms as performing inference for a simple
1308: 1134: 966: 643: 538: 326: 259: 219: 3281: 1648: 5597: 5455: 5094: 4925: 4748: 4144: 3997: 1020: 626: 394: 264: 3226:"On the validity of pre-trained transformers for natural language processing in the software engineering domain" 5496: 4753: 4670: 4594: 4326: 4282: 4167: 4065: 3614:"Radiology report annotation using intelligent word embeddings: Applied to multi-institutional chest CT cohort" 3306: 2902: 2844: 2774: 2677: 2644: 2536: 2518: 1213: 648: 568: 491: 409: 239: 201: 196: 156: 151: 5541: 5526: 5179: 5174: 5074: 4942: 4723: 4574: 4544: 4211: 595: 444: 344: 171: 5501: 5261: 4980: 4975: 4431: 4124: 4102: 4092: 4060: 4035: 2752: 1297: 775: 751: 653: 414: 389: 349: 161: 2643:
pieces of texts, such as sentences, paragraphs, or entire documents. doc2vec has been implemented in the
5531: 5516: 5481: 5169: 5069: 4937: 4291: 3767: 2912: 2560: 1597: 729: 551: 503: 359: 274: 146: 5399: 2745: 3137: 5551: 5506: 4952: 4897: 4743: 4738: 4644: 4320: 4296: 4149: 3545: 3016: 2867: 2751:
IWE combines Word2vec with a semantic dictionary mapping technique to tackle the major challenges of
658: 608: 2339:
The probability model is still the dot-product-softmax model, so the calculation proceeds as before.
5126: 5104: 4853: 4848: 4806: 4758: 4624: 4554: 4511: 4467: 4239: 4229: 4224: 4112: 2974: 2807: 1153: 1087: 761: 697: 668: 573: 399: 332: 318: 304: 279: 229: 181: 141: 78: 5565: 2510: 1114:
that are trained to reconstruct linguistic contexts of words. Word2vec takes as its input a large
1068: 5511: 5089: 4918: 4634: 4506: 4371: 4134: 4117: 3975: 3850: 3824: 3753: 3707: 3592: 3535: 3483: 3441: 3409: 3342: 3263: 3237: 3204: 3164:"Yesterday we received a Test of Time Award at NeurIPS for the word2vec paper from ten years ago" 3095: 3070: 3006: 2960: 2887: 2872: 2524:
Word2vec was created, patented, and published in 2013 by a team of researchers led by Mikolov at
1127: 739: 663: 449: 244: 3524:"Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics" 1836:{\displaystyle Pr(w|w_{j}:j\in N+i):={\frac {e^{v_{w}\cdot v}}{\sum _{w\in V}e^{v_{w}\cdot v}}}} 3337:
Le, Quoc; Mikolov, Tomas (May 2014). "Distributed Representations of Sentences and Documents".
5577: 5369: 5021: 4892: 4885: 4639: 4351: 4159: 4070: 3842: 3643: 3573: 3499: 3423: 3419: 3401: 3255: 2770: 2732: 1083: 1079: 832: 675: 588: 384: 354: 299: 294: 249: 191: 5322: 5312: 5119: 4913: 4863: 4858: 4801: 4789: 4516: 4401: 4376: 4177: 4080: 3834: 3717: 3674: 3633: 3625: 3591:
Ng, Patrick (2017). "dna2vec: Consistent vector representations of variable-length k-mers".
3563: 3553: 3491: 3247: 3119: 2877: 2782: 2724: 2584: 1286: 1255: 860: 613: 563: 473: 457: 427: 289: 284: 234: 224: 122: 101: 5435: 5379: 5201: 4843: 4763: 4628: 4589: 4584: 4452: 4182: 4055: 4030: 4012: 3780: 3190: 2793: 888: 692: 558: 498: 3549: 3020: 5409: 5374: 5364: 5189: 4947: 4773: 4336: 4316: 4040: 3794: 3638: 3613: 3568: 3523: 2892: 2882: 2765: 2592: 2588: 2004:
which is intractable. This prompted the authors to use numerical approximation tricks.
1193: 1173: 1111: 1107: 1060: 908: 439: 176: 91: 86: 3910: 3363: 5616: 5354: 5334: 5251: 4930: 4599: 4411: 4391: 4172: 3854: 3267: 2720: 827: 756: 638: 369: 254: 3402:"Gov2Vec: Learning Distributed Representations of Institutions and Their Legal Text" 3379:"Word Embeddings for the Analysis of Ideological Placement in Parliamentary Corpora" 1067:
words or suggest additional words for a partial sentence. Word2vec was developed by
5440: 5271: 4579: 3915: 1138: 1119: 1055: 3123: 3877: 3838: 3558: 3495: 2997:
Mikolov, Tomas; Sutskever, Ilya; Chen, Kai; Corrado, Greg S.; Dean, Jeff (2013).
5536: 5307: 5216: 5211: 4833: 4811: 4536: 4416: 4129: 4045: 4022: 3970: 3612:
Banerjee, Imon; Chen, Matthew C.; Lungren, Matthew P.; Rubin, Daniel L. (2018).
3141: 2862: 1115: 633: 127: 3663:"Improving Distributional Similarity with Lessons Learned from Word Embeddings" 3440:
Angelov, Dimo (August 2020). "Top2Vec: Distributed Representations of Topics".
3225: 5430: 5389: 5384: 5297: 5206: 5114: 5026: 5006: 4139: 3925: 3629: 3039: 2704: 2501:
There is only a single difference from the CBOW equation, highlighted in red.
1123: 1098:
are nearby, as are those for "but" and "however", and "Berlin" and "Germany".
782: 478: 404: 106: 71: 36: 3259: 3251: 5425: 5394: 5292: 5136: 5099: 5036: 4990: 4985: 4970: 4007: 3895: 3180: 2572: 2245:
Products are numerically unstable, so we convert it by taking the logarithm:
1497:
Products are numerically unstable, so we convert it by taking the logarithm:
941: 722: 41: 4686: 3846: 3662: 3647: 3577: 2999:
Distributed representations of words and phrases and their compositionality
3752:
Jansen, Stefan (9 May 2017). "Word and Phrase Translation with word2vec".
3673:. Transactions of the Association for Computational Linguistics: 211–225. 5327: 4482: 4462: 4447: 4426: 4396: 4306: 4187: 3722: 3679: 3040:"Google Code Archive - Long-term storage for Google Code Project Hosting" 2897: 2857: 2545: 2235:{\displaystyle Pr(w_{j}:j\in N+i|w_{i})=\prod _{j\in N+i}Pr(w_{j}|w_{i})} 3871: 2723:
applications has been proposed by Asgari and Mofrad. Named bio-vectors (
2548:
project showed that word2vec is superior when trained on the same data.
1133:
Word2vec can utilize either of two model architectures to produce these
17: 5450: 5287: 5241: 5164: 5064: 5059: 5011: 4619: 4477: 4457: 4331: 4075: 3990: 3890: 3695: 3490:. Lecture Notes in Computer Science. Vol. 7819. pp. 160–172. 3462: 2716: 1064: 717: 2012: 5465: 5445: 5317: 5109: 3985: 3980: 3920: 3467: 3224:
Von der Mosel, Julian; Trautsch, Alexander; Herbold, Steffen (2022).
2551:
As of 2022, the straight Word2vec approach was described as "dated."
2525: 1072: 468: 3339:
Proceedings of the 31st International Conference on Machine Learning
2575:. The following are some important parameters in word2vec training. 3829: 3758: 3712: 3597: 3540: 3446: 3414: 3242: 3209: 5266: 5246: 5236: 5231: 5226: 5221: 5184: 5016: 4675: 4311: 3874: 3484:"Density-Based Clustering Based on Hierarchical Density Estimates" 3347: 3100: 3075: 3011: 2965: 2792: 2540: 2528:
over two papers. The original paper was rejected by reviewers for
2011: 1406:
Then the training objective is to maximize the following quantity:
1296: 712: 707: 434: 5256: 4197: 2907: 2556: 4690: 3929: 3696:"A Latent Variable Model Approach to PMI-based Word Embeddings" 1190:("vocabulary") be the set of all words appearing in the corpus 1106:
Word2vec is a group of related models that are used to produce
4472: 2712: 2708: 3700:
Transactions of the Association for Computational Linguistics
3667:
Transactions of the Association for Computational Linguistics
3314:
Journal of Machine Learning Research, 2008. Vol. 9, pg. 2595
1289:, but the framework allows other ways to measure closeness. 1843:
The quantity to be maximized is then after simplifications:
3482:
Campello, Ricardo; Moulavi, Davoud; Sander, Joerg (2013).
2332:{\displaystyle \sum _{i\in C,j\in N+i}\ln Pr(w_{j}|w_{i})} 1584:{\displaystyle \sum _{i\in C}\ln Pr(w_{i}|w_{j}:j\in N+i)} 1591:
That is, we maximize the log-probability of the corpus.
1285:
In the original publication, "closeness" is measured by
1000:
List of datasets in computer vision and image processing
3377:
Rheault, Ludovic; Cochrane, Christopher (3 July 2019).
2101:{\displaystyle \prod _{i\in C}Pr(w_{j}:j\in N+i|w_{i})} 1487:{\displaystyle \prod _{i\in C}Pr(w_{i}|w_{j}:j\in N+i)} 2345: 2251: 2114: 2026: 1849: 1713: 1651: 1600: 1503: 1412: 1311: 1258: 1216: 1196: 1176: 2789:
Preservation of semantic and syntactic relationships
1130:
being assigned a corresponding vector in the space.
5474: 5418: 5347: 5280: 5152: 5052: 5045: 4999: 4963: 4906: 4782: 4722: 4653: 4608: 4563: 4535: 4495: 4440: 4362: 4350: 4281: 4238: 4210: 4158: 4021: 3963: 2773:) and note that this is in line with J. R. Firth's 100: 77: 65: 47: 35: 3522:Asgari, Ehsaneddin; Mofrad, Mohammad R.K. (2015). 2843:, outperforming another word-embedding technique, 2639:doc2vec, generates distributed representations of 2583:A Word2vec model can be trained with hierarchical 2521:with a single hidden layer to language modelling. 2493: 2331: 2234: 2100: 1995: 1835: 1699: 1637: 1583: 1486: 1395: 1270: 1244: 1202: 1182: 1090:between the words, so for example the vectors for 3003:Advances in Neural Information Processing Systems 2571:Results of word2vec training can be sensitive to 1594:Our probability model is as follows: Given words 3661:Levy, Omer; Goldberg, Yoav; Dagan, Ido (2015). 3488:Advances in Knowledge Discovery and Data Mining 2832:expensive and yields similar accuracy results. 2739:Radiology and intelligent word embeddings (IWE) 2631:There are a variety of extensions to word2vec. 1078:Word2vec represents a word as a high-dimension 2992: 2990: 995:List of datasets for machine-learning research 4702: 3941: 3182:GloVe: Global Vectors for Word Representation 2680:them into a lower dimension (typically using 1396:{\displaystyle N=\{-4,-3,-2,-1,+1,+2,+3,+4\}} 1028: 8: 2703:An extension of word vectors for n-grams in 1700:{\displaystyle v:=\sum _{j\in N+i}v_{w_{j}}} 1632: 1601: 1390: 1318: 30: 72:https://code.google.com/archive/p/word2vec/ 5049: 4709: 4695: 4687: 4359: 4155: 3948: 3934: 3926: 2954: 1035: 1021: 113: 29: 3828: 3757: 3721: 3711: 3678: 3637: 3596: 3567: 3557: 3539: 3445: 3413: 3346: 3241: 3230:IEEE Transactions on Software Engineering 3208: 3099: 3074: 3064: 3062: 3060: 3010: 2964: 2952: 2950: 2948: 2946: 2944: 2942: 2940: 2938: 2936: 2934: 2687:As opposed to other topic models such as 2474: 2469: 2456: 2451: 2435: 2414: 2409: 2394: 2389: 2350: 2344: 2320: 2311: 2305: 2256: 2250: 2223: 2214: 2208: 2177: 2161: 2152: 2128: 2113: 2089: 2080: 2056: 2031: 2025: 1978: 1973: 1960: 1955: 1939: 1918: 1913: 1898: 1893: 1854: 1848: 1816: 1811: 1795: 1776: 1771: 1765: 1735: 1726: 1712: 1689: 1684: 1662: 1650: 1608: 1599: 1554: 1545: 1539: 1508: 1502: 1457: 1448: 1442: 1417: 1411: 1310: 1257: 1245:{\displaystyle v_{w}\in \mathbb {R} ^{n}} 1236: 1232: 1231: 1221: 1215: 1195: 1175: 2020:For skip-gram, the training objective is 1063:. Once trained, such a model can detect 2930: 121: 3795:"Gensim - Deep learning with word2vec" 3776: 3765: 3170:from the original on 24 December 2023. 2972: 2797:Visual illustration of word embeddings 1164:This section is based on expositions. 1110:. These models are shallow, two-layer 27:Models used to produce word embeddings 3694:Arora, S; et al. (Summer 2016). 3517: 3515: 3435: 3433: 3332: 3330: 3092:word2vec Parameter Learning Explained 1301:Continuous Bag of Words model (CBOW). 7: 5628:Natural language processing toolkits 5547:Generative adversarial network (GAN) 4407:Simple Knowledge Organization System 3157: 3155: 3034: 3032: 3030: 3162:Mikolov, Tomáš (13 December 2023). 2517:) with co-authors applied a simple 990:Glossary of artificial intelligence 3461:Angelov, Dimo (11 November 2022). 3118:. ISCA: ISCA. pp. 1045–1048. 1638:{\displaystyle \{w_{j}:j\in N+i\}} 1210:. Our goal is to learn one vector 25: 4422:Thesaurus (information retrieval) 3618:Journal of Biomedical Informatics 2475: 1149:better job for infrequent words. 5585: 5584: 5564: 2814:Assessing the quality of a model 3282:"Parameter (hs & negative)" 2918:Normalized compression distance 1126:, with each unique word in the 1122:, typically of several hundred 5497:Recurrent neural network (RNN) 5487:Differentiable neural computer 4003:Natural language understanding 3400:Nay, John (21 December 2017). 3307:"Visualizing Data using t-SNE" 2883:Neural network language models 2326: 2312: 2298: 2229: 2215: 2201: 2167: 2153: 2121: 2095: 2081: 2049: 1759: 1727: 1720: 1578: 1546: 1532: 1481: 1449: 1435: 1293:Continuous Bag of Words (CBOW) 1086:. This indicates the level of 410:Relevance vector machine (RVM) 1: 5542:Variational autoencoder (VAE) 5502:Long short-term memory (LSTM) 4769:Computational learning theory 4527:Optical character recognition 3124:10.21437/interspeech.2010-343 2979:: CS1 maint: date and year ( 2515:Brno University of Technology 899:Computational learning theory 463:Expectation–maximization (EM) 5522:Convolutional neural network 4220:Multi-document summarization 3839:10.1016/j.concog.2017.09.004 3559:10.1371/journal.pone.0141287 3496:10.1007/978-3-642-37456-2_14 2823:Parameters and model quality 1645:, it takes their vector sum 856:Coefficient of determination 703:Convolutional neural network 415:Support vector machine (SVM) 5517:Multilayer perceptron (MLP) 4550:Latent Dirichlet allocation 4522:Natural language generation 4387:Machine-readable dictionary 4382:Linguistic Linked Open Data 3957:Natural language processing 3817:Consciousness and Cognition 2764:The reasons for successful 1135:distributed representations 1052:natural language processing 1007:Outline of machine learning 904:Empirical risk minimization 5654: 5633:Artificial neural networks 5593:Artificial neural networks 5507:Gated recurrent unit (GRU) 4733:Differentiable programming 4302:Explicit semantic analysis 4051:Deep linguistic processing 644:Feedforward neural network 395:Artificial neural networks 5560: 4926:Artificial neural network 4749:Automatic differentiation 4145:Word-sense disambiguation 3998:Computational linguistics 3630:10.1016/j.jbi.2017.11.012 3090:Rong, Xin (5 June 2016), 2775:distributional hypothesis 2735:of dna2vec word vectors. 627:Artificial neural network 4754:Neuromorphic engineering 4717:Differentiable computing 4671:Natural Language Toolkit 4595:Pronunciation assessment 4497:Automatic identification 4327:Latent semantic analysis 4283:Distributional semantics 4168:Compound-term processing 4066:Named-entity recognition 3252:10.1109/TSE.2022.3178469 2845:latent semantic analysis 2537:latent semantic analysis 2519:recurrent neural network 936:Journals and conferences 883:Mathematical foundations 793:Temporal difference (TD) 649:Recurrent neural network 569:Conditional random field 492:Dimensionality reduction 240:Dimensionality reduction 202:Quantum machine learning 197:Neuromorphic engineering 157:Self-supervised learning 152:Semi-supervised learning 5527:Residual neural network 4943:Artificial Intelligence 4575:Automated essay scoring 4545:Document classification 4212:Automatic summarization 2555:-based models, such as 1075:and published in 2013. 345:Apprenticeship learning 4432:Universal Dependencies 4125:Terminology extraction 4108:Semantic decomposition 4103:Semantic role labeling 4093:Part-of-speech tagging 4061:Information extraction 4046:Coreference resolution 4036:Collocation extraction 3775:Cite journal requires 2798: 2753:information extraction 2495: 2333: 2236: 2102: 2017: 1997: 1837: 1701: 1639: 1585: 1488: 1397: 1302: 1272: 1271:{\displaystyle w\in V} 1246: 1204: 1184: 894:Bias–variance tradeoff 776:Reinforcement learning 752:Spiking neural network 162:Reinforcement learning 53:; 11 years ago 5623:Free science software 5482:Neural Turing machine 5070:Human image synthesis 4193:Sentence segmentation 2913:BERT (language model) 2796: 2731:similarity score and 2496: 2334: 2237: 2103: 2015: 1998: 1838: 1702: 1640: 1586: 1489: 1398: 1300: 1273: 1247: 1205: 1185: 1137:of words: continuous 730:Neural radiance field 552:Structured prediction 275:Structured prediction 147:Unsupervised learning 5573:Computer programming 5552:Graph neural network 5127:Text-to-video models 5105:Text-to-image models 4953:Large language model 4938:Scientific computing 4744:Statistical manifold 4739:Information geometry 4645:Voice user interface 4356:datasets and corpora 4297:Document-term matrix 4150:Word-sense induction 3723:10.1162/tacl_a_00106 3680:10.1162/tacl_a_00134 2868:Document-term matrix 2343: 2249: 2112: 2024: 1847: 1711: 1649: 1598: 1501: 1410: 1309: 1256: 1214: 1194: 1174: 1160:Mathematical details 1054:(NLP) for obtaining 919:Statistical learning 817:Learning with humans 609:Local outlier factor 4919:In-context learning 4759:Pattern recognition 4625:Interactive fiction 4555:Pachinko allocation 4512:Speech segmentation 4468:Google Ngram Viewer 4240:Machine translation 4230:Text simplification 4225:Sentence extraction 4113:Semantic similarity 3906:Python (TensorFlow) 3726:– via ACLWEB. 3550:2015PLoSO..1041287A 3021:2013arXiv1310.4546M 2808:machine translation 1088:semantic similarity 762:Electrochemical RAM 669:reservoir computing 400:Logistic regression 319:Supervised learning 305:Multimodal learning 280:Feature engineering 225:Generative modeling 187:Rule-based learning 182:Curriculum learning 142:Supervised learning 117:Part of a series on 32: 5638:Semantic relations 5512:Echo state network 5400:Jürgen Schmidhuber 5095:Facial recognition 5090:Speech recognition 5000:Software libraries 4635:Question answering 4507:Speech recognition 4372:Corpus linguistics 4352:Language resources 4135:Textual entailment 4118:Sentiment analysis 3383:Political Analysis 3189:2020-09-03 at the 2888:Vector space model 2873:Feature extraction 2799: 2579:Training algorithm 2491: 2479: 2446: 2379: 2329: 2285: 2232: 2194: 2098: 2042: 2018: 1993: 1950: 1883: 1833: 1806: 1697: 1679: 1635: 1581: 1519: 1484: 1428: 1393: 1303: 1268: 1242: 1200: 1180: 1071:and colleagues at 1050:is a technique in 330: • 245:Density estimation 37:Original author(s) 5608: 5607: 5370:Stephen Grossberg 5343: 5342: 4684: 4683: 4640:Virtual assistant 4565:Computer-assisted 4491: 4490: 4248:Computer-assisted 4206: 4205: 4198:Word segmentation 4160:Text segmentation 4098:Semantic analysis 4086:Syntactic parsing 4071:Ontology learning 3505:978-3-642-37455-5 2771:cosine similarity 2746:out-of-vocabulary 2733:cosine similarity 2431: 2346: 2252: 2173: 2027: 1935: 1850: 1831: 1791: 1658: 1504: 1413: 1203:{\displaystyle C} 1183:{\displaystyle V} 1084:cosine similarity 1045: 1044: 850:Model diagnostics 833:Human-in-the-loop 676:Boltzmann machine 589:Anomaly detection 385:Linear regression 300:Ontology learning 295:Grammar induction 270:Semantic analysis 265:Association rules 250:Anomaly detection 192:Neuro-symbolic AI 112: 111: 16:(Redirected from 5645: 5598:Machine learning 5588: 5587: 5568: 5323:Action selection 5313:Self-driving car 5120:Stable Diffusion 5085:Speech synthesis 5050: 4914:Machine learning 4790:Gradient descent 4711: 4704: 4697: 4688: 4661:Formal semantics 4610:Natural language 4517:Speech synthesis 4499:and data capture 4402:Semantic network 4377:Lexical resource 4360: 4178:Lexical analysis 4156: 4081:Semantic parsing 3950: 3943: 3936: 3927: 3859: 3858: 3832: 3812: 3806: 3805: 3803: 3801: 3791: 3785: 3784: 3778: 3773: 3771: 3763: 3761: 3749: 3743: 3742: 3734: 3728: 3727: 3725: 3715: 3691: 3685: 3684: 3682: 3658: 3652: 3651: 3641: 3609: 3603: 3602: 3600: 3588: 3582: 3581: 3571: 3561: 3543: 3534:(11): e0141287. 3519: 3510: 3509: 3479: 3473: 3472: 3458: 3452: 3451: 3449: 3437: 3428: 3427: 3417: 3397: 3391: 3390: 3374: 3368: 3367: 3362:Rehurek, Radim. 3359: 3353: 3352: 3350: 3334: 3325: 3324: 3322: 3320: 3311: 3303: 3297: 3296: 3294: 3292: 3278: 3272: 3271: 3245: 3236:(4): 1487–1507. 3221: 3215: 3214: 3212: 3200: 3194: 3178: 3172: 3171: 3159: 3150: 3149: 3148: 3144: 3134: 3128: 3127: 3116:Interspeech 2010 3111: 3105: 3104: 3103: 3087: 3081: 3080: 3078: 3066: 3055: 3054: 3052: 3050: 3036: 3025: 3024: 3014: 2994: 2985: 2984: 2978: 2970: 2968: 2956: 2878:Feature learning 2783:generative model 2729:Needleman–Wunsch 2707:sequences (e.g. 2567:Parameterization 2500: 2498: 2497: 2492: 2490: 2486: 2485: 2484: 2483: 2482: 2481: 2480: 2461: 2460: 2445: 2421: 2420: 2419: 2418: 2401: 2400: 2399: 2398: 2378: 2338: 2336: 2335: 2330: 2325: 2324: 2315: 2310: 2309: 2284: 2241: 2239: 2238: 2233: 2228: 2227: 2218: 2213: 2212: 2193: 2166: 2165: 2156: 2133: 2132: 2107: 2105: 2104: 2099: 2094: 2093: 2084: 2061: 2060: 2041: 2002: 2000: 1999: 1994: 1992: 1988: 1987: 1986: 1985: 1984: 1983: 1982: 1965: 1964: 1949: 1925: 1924: 1923: 1922: 1905: 1904: 1903: 1902: 1882: 1842: 1840: 1839: 1834: 1832: 1830: 1829: 1828: 1821: 1820: 1805: 1789: 1788: 1781: 1780: 1766: 1740: 1739: 1730: 1706: 1704: 1703: 1698: 1696: 1695: 1694: 1693: 1678: 1644: 1642: 1641: 1636: 1613: 1612: 1590: 1588: 1587: 1582: 1559: 1558: 1549: 1544: 1543: 1518: 1493: 1491: 1490: 1485: 1462: 1461: 1452: 1447: 1446: 1427: 1402: 1400: 1399: 1394: 1277: 1275: 1274: 1269: 1251: 1249: 1248: 1243: 1241: 1240: 1235: 1226: 1225: 1209: 1207: 1206: 1201: 1189: 1187: 1186: 1181: 1037: 1030: 1023: 984:Related articles 861:Confusion matrix 614:Isolation forest 559:Graphical models 338: 337: 290:Learning to rank 285:Feature learning 123:Machine learning 114: 61: 59: 54: 33: 21: 5653: 5652: 5648: 5647: 5646: 5644: 5643: 5642: 5613: 5612: 5609: 5604: 5556: 5470: 5436:Google DeepMind 5414: 5380:Geoffrey Hinton 5339: 5276: 5202:Project Debater 5148: 5046:Implementations 5041: 4995: 4959: 4902: 4844:Backpropagation 4778: 4764:Tensor calculus 4718: 4715: 4685: 4680: 4649: 4629:Syntax guessing 4611: 4604: 4590:Predictive text 4585:Grammar checker 4566: 4559: 4531: 4498: 4487: 4453:Bank of English 4436: 4364: 4355: 4346: 4277: 4234: 4202: 4154: 4056:Distant reading 4031:Argument mining 4017: 4013:Text processing 3959: 3954: 3911:Python (Gensim) 3887: 3885:Implementations 3868: 3863: 3862: 3814: 3813: 3809: 3799: 3797: 3793: 3792: 3788: 3774: 3764: 3751: 3750: 3746: 3736: 3735: 3731: 3693: 3692: 3688: 3660: 3659: 3655: 3611: 3610: 3606: 3590: 3589: 3585: 3521: 3520: 3513: 3506: 3481: 3480: 3476: 3460: 3459: 3455: 3439: 3438: 3431: 3399: 3398: 3394: 3376: 3375: 3371: 3361: 3360: 3356: 3336: 3335: 3328: 3318: 3316: 3309: 3305: 3304: 3300: 3290: 3288: 3280: 3279: 3275: 3223: 3222: 3218: 3202: 3201: 3197: 3191:Wayback Machine 3179: 3175: 3161: 3160: 3153: 3146: 3136: 3135: 3131: 3113: 3112: 3108: 3089: 3088: 3084: 3068: 3067: 3058: 3048: 3046: 3044:code.google.com 3038: 3037: 3028: 2996: 2995: 2988: 2971: 2958: 2957: 2932: 2927: 2922: 2853: 2825: 2816: 2791: 2762: 2741: 2701: 2673: 2641:variable-length 2637: 2629: 2620: 2611: 2602: 2581: 2573:parametrization 2569: 2530:ICLR conference 2507: 2470: 2465: 2452: 2447: 2410: 2405: 2390: 2385: 2384: 2380: 2341: 2340: 2316: 2301: 2247: 2246: 2219: 2204: 2157: 2124: 2110: 2109: 2085: 2052: 2022: 2021: 2010: 1974: 1969: 1956: 1951: 1914: 1909: 1894: 1889: 1888: 1884: 1845: 1844: 1812: 1807: 1790: 1772: 1767: 1731: 1709: 1708: 1685: 1680: 1647: 1646: 1604: 1596: 1595: 1550: 1535: 1499: 1498: 1453: 1438: 1408: 1407: 1307: 1306: 1295: 1254: 1253: 1230: 1217: 1212: 1211: 1192: 1191: 1172: 1171: 1162: 1118:and produces a 1112:neural networks 1108:word embeddings 1104: 1041: 1012: 1011: 985: 977: 976: 937: 929: 928: 889:Kernel machines 884: 876: 875: 851: 843: 842: 823:Active learning 818: 810: 809: 778: 768: 767: 693:Diffusion model 629: 619: 618: 591: 581: 580: 554: 544: 543: 499:Factor analysis 494: 484: 483: 467: 430: 420: 419: 340: 339: 323: 322: 321: 310: 309: 215: 207: 206: 172:Online learning 137: 125: 96: 57: 55: 52: 48:Initial release 28: 23: 22: 15: 12: 11: 5: 5651: 5649: 5641: 5640: 5635: 5630: 5625: 5615: 5614: 5606: 5605: 5603: 5602: 5601: 5600: 5595: 5582: 5581: 5580: 5575: 5561: 5558: 5557: 5555: 5554: 5549: 5544: 5539: 5534: 5529: 5524: 5519: 5514: 5509: 5504: 5499: 5494: 5489: 5484: 5478: 5476: 5472: 5471: 5469: 5468: 5463: 5458: 5453: 5448: 5443: 5438: 5433: 5428: 5422: 5420: 5416: 5415: 5413: 5412: 5410:Ilya Sutskever 5407: 5402: 5397: 5392: 5387: 5382: 5377: 5375:Demis Hassabis 5372: 5367: 5365:Ian Goodfellow 5362: 5357: 5351: 5349: 5345: 5344: 5341: 5340: 5338: 5337: 5332: 5331: 5330: 5320: 5315: 5310: 5305: 5300: 5295: 5290: 5284: 5282: 5278: 5277: 5275: 5274: 5269: 5264: 5259: 5254: 5249: 5244: 5239: 5234: 5229: 5224: 5219: 5214: 5209: 5204: 5199: 5194: 5193: 5192: 5182: 5177: 5172: 5167: 5162: 5156: 5154: 5150: 5149: 5147: 5146: 5141: 5140: 5139: 5134: 5124: 5123: 5122: 5117: 5112: 5102: 5097: 5092: 5087: 5082: 5077: 5072: 5067: 5062: 5056: 5054: 5047: 5043: 5042: 5040: 5039: 5034: 5029: 5024: 5019: 5014: 5009: 5003: 5001: 4997: 4996: 4994: 4993: 4988: 4983: 4978: 4973: 4967: 4965: 4961: 4960: 4958: 4957: 4956: 4955: 4948:Language model 4945: 4940: 4935: 4934: 4933: 4923: 4922: 4921: 4910: 4908: 4904: 4903: 4901: 4900: 4898:Autoregression 4895: 4890: 4889: 4888: 4878: 4876:Regularization 4873: 4872: 4871: 4866: 4861: 4851: 4846: 4841: 4839:Loss functions 4836: 4831: 4826: 4821: 4816: 4815: 4814: 4804: 4799: 4798: 4797: 4786: 4784: 4780: 4779: 4777: 4776: 4774:Inductive bias 4771: 4766: 4761: 4756: 4751: 4746: 4741: 4736: 4728: 4726: 4720: 4719: 4716: 4714: 4713: 4706: 4699: 4691: 4682: 4681: 4679: 4678: 4673: 4668: 4663: 4657: 4655: 4651: 4650: 4648: 4647: 4642: 4637: 4632: 4622: 4616: 4614: 4612:user interface 4606: 4605: 4603: 4602: 4597: 4592: 4587: 4582: 4577: 4571: 4569: 4561: 4560: 4558: 4557: 4552: 4547: 4541: 4539: 4533: 4532: 4530: 4529: 4524: 4519: 4514: 4509: 4503: 4501: 4493: 4492: 4489: 4488: 4486: 4485: 4480: 4475: 4470: 4465: 4460: 4455: 4450: 4444: 4442: 4438: 4437: 4435: 4434: 4429: 4424: 4419: 4414: 4409: 4404: 4399: 4394: 4389: 4384: 4379: 4374: 4368: 4366: 4357: 4348: 4347: 4345: 4344: 4339: 4337:Word embedding 4334: 4329: 4324: 4317:Language model 4314: 4309: 4304: 4299: 4294: 4288: 4286: 4279: 4278: 4276: 4275: 4270: 4268:Transfer-based 4265: 4260: 4255: 4250: 4244: 4242: 4236: 4235: 4233: 4232: 4227: 4222: 4216: 4214: 4208: 4207: 4204: 4203: 4201: 4200: 4195: 4190: 4185: 4180: 4175: 4170: 4164: 4162: 4153: 4152: 4147: 4142: 4137: 4132: 4127: 4121: 4120: 4115: 4110: 4105: 4100: 4095: 4090: 4089: 4088: 4083: 4073: 4068: 4063: 4058: 4053: 4048: 4043: 4041:Concept mining 4038: 4033: 4027: 4025: 4019: 4018: 4016: 4015: 4010: 4005: 4000: 3995: 3994: 3993: 3988: 3978: 3973: 3967: 3965: 3961: 3960: 3955: 3953: 3952: 3945: 3938: 3930: 3924: 3923: 3918: 3913: 3908: 3903: 3901:Python (Spark) 3898: 3893: 3886: 3883: 3882: 3881: 3867: 3866:External links 3864: 3861: 3860: 3807: 3786: 3777:|journal= 3744: 3729: 3686: 3653: 3604: 3583: 3511: 3504: 3474: 3453: 3429: 3392: 3369: 3354: 3326: 3298: 3273: 3216: 3195: 3173: 3151: 3129: 3106: 3082: 3056: 3026: 2986: 2929: 2928: 2926: 2923: 2921: 2920: 2915: 2910: 2905: 2900: 2895: 2893:Thought vector 2890: 2885: 2880: 2875: 2870: 2865: 2860: 2854: 2852: 2849: 2841:learning curve 2824: 2821: 2815: 2812: 2810:of new words. 2790: 2787: 2766:word embedding 2761: 2758: 2740: 2737: 2721:bioinformatics 2700: 2697: 2672: 2669: 2636: 2633: 2628: 2625: 2619: 2618:Context window 2616: 2610: 2609:Dimensionality 2607: 2601: 2598: 2593:log-likelihood 2580: 2577: 2568: 2565: 2506: 2503: 2489: 2478: 2473: 2468: 2464: 2459: 2455: 2450: 2444: 2441: 2438: 2434: 2430: 2427: 2424: 2417: 2413: 2408: 2404: 2397: 2393: 2388: 2383: 2377: 2374: 2371: 2368: 2365: 2362: 2359: 2356: 2353: 2349: 2328: 2323: 2319: 2314: 2308: 2304: 2300: 2297: 2294: 2291: 2288: 2283: 2280: 2277: 2274: 2271: 2268: 2265: 2262: 2259: 2255: 2231: 2226: 2222: 2217: 2211: 2207: 2203: 2200: 2197: 2192: 2189: 2186: 2183: 2180: 2176: 2172: 2169: 2164: 2160: 2155: 2151: 2148: 2145: 2142: 2139: 2136: 2131: 2127: 2123: 2120: 2117: 2097: 2092: 2088: 2083: 2079: 2076: 2073: 2070: 2067: 2064: 2059: 2055: 2051: 2048: 2045: 2040: 2037: 2034: 2030: 2009: 2006: 1991: 1981: 1977: 1972: 1968: 1963: 1959: 1954: 1948: 1945: 1942: 1938: 1934: 1931: 1928: 1921: 1917: 1912: 1908: 1901: 1897: 1892: 1887: 1881: 1878: 1875: 1872: 1869: 1866: 1863: 1860: 1857: 1853: 1827: 1824: 1819: 1815: 1810: 1804: 1801: 1798: 1794: 1787: 1784: 1779: 1775: 1770: 1764: 1761: 1758: 1755: 1752: 1749: 1746: 1743: 1738: 1734: 1729: 1725: 1722: 1719: 1716: 1692: 1688: 1683: 1677: 1674: 1671: 1668: 1665: 1661: 1657: 1654: 1634: 1631: 1628: 1625: 1622: 1619: 1616: 1611: 1607: 1603: 1580: 1577: 1574: 1571: 1568: 1565: 1562: 1557: 1553: 1548: 1542: 1538: 1534: 1531: 1528: 1525: 1522: 1517: 1514: 1511: 1507: 1483: 1480: 1477: 1474: 1471: 1468: 1465: 1460: 1456: 1451: 1445: 1441: 1437: 1434: 1431: 1426: 1423: 1420: 1416: 1392: 1389: 1386: 1383: 1380: 1377: 1374: 1371: 1368: 1365: 1362: 1359: 1356: 1353: 1350: 1347: 1344: 1341: 1338: 1335: 1332: 1329: 1326: 1323: 1320: 1317: 1314: 1294: 1291: 1267: 1264: 1261: 1252:for each word 1239: 1234: 1229: 1224: 1220: 1199: 1179: 1161: 1158: 1116:corpus of text 1103: 1100: 1043: 1042: 1040: 1039: 1032: 1025: 1017: 1014: 1013: 1010: 1009: 1004: 1003: 1002: 992: 986: 983: 982: 979: 978: 975: 974: 969: 964: 959: 954: 949: 944: 938: 935: 934: 931: 930: 927: 926: 921: 916: 911: 909:Occam learning 906: 901: 896: 891: 885: 882: 881: 878: 877: 874: 873: 868: 866:Learning curve 863: 858: 852: 849: 848: 845: 844: 841: 840: 835: 830: 825: 819: 816: 815: 812: 811: 808: 807: 806: 805: 795: 790: 785: 779: 774: 773: 770: 769: 766: 765: 759: 754: 749: 744: 743: 742: 732: 727: 726: 725: 720: 715: 710: 700: 695: 690: 685: 684: 683: 673: 672: 671: 666: 661: 656: 646: 641: 636: 630: 625: 624: 621: 620: 617: 616: 611: 606: 598: 592: 587: 586: 583: 582: 579: 578: 577: 576: 571: 566: 555: 550: 549: 546: 545: 542: 541: 536: 531: 526: 521: 516: 511: 506: 501: 495: 490: 489: 486: 485: 482: 481: 476: 471: 465: 460: 455: 447: 442: 437: 431: 426: 425: 422: 421: 418: 417: 412: 407: 402: 397: 392: 387: 382: 374: 373: 372: 367: 362: 352: 350:Decision trees 347: 341: 327:classification 317: 316: 315: 312: 311: 308: 307: 302: 297: 292: 287: 282: 277: 272: 267: 262: 257: 252: 247: 242: 237: 232: 227: 222: 220:Classification 216: 213: 212: 209: 208: 205: 204: 199: 194: 189: 184: 179: 177:Batch learning 174: 169: 164: 159: 154: 149: 144: 138: 135: 134: 131: 130: 119: 118: 110: 109: 104: 98: 97: 95: 94: 92:Word embedding 89: 87:Language model 83: 81: 75: 74: 69: 63: 62: 58:July 29, 2013. 51:July 29, 2013. 49: 45: 44: 39: 26: 24: 14: 13: 10: 9: 6: 4: 3: 2: 5650: 5639: 5636: 5634: 5631: 5629: 5626: 5624: 5621: 5620: 5618: 5611: 5599: 5596: 5594: 5591: 5590: 5583: 5579: 5576: 5574: 5571: 5570: 5567: 5563: 5562: 5559: 5553: 5550: 5548: 5545: 5543: 5540: 5538: 5535: 5533: 5530: 5528: 5525: 5523: 5520: 5518: 5515: 5513: 5510: 5508: 5505: 5503: 5500: 5498: 5495: 5493: 5490: 5488: 5485: 5483: 5480: 5479: 5477: 5475:Architectures 5473: 5467: 5464: 5462: 5459: 5457: 5454: 5452: 5449: 5447: 5444: 5442: 5439: 5437: 5434: 5432: 5429: 5427: 5424: 5423: 5421: 5419:Organizations 5417: 5411: 5408: 5406: 5403: 5401: 5398: 5396: 5393: 5391: 5388: 5386: 5383: 5381: 5378: 5376: 5373: 5371: 5368: 5366: 5363: 5361: 5358: 5356: 5355:Yoshua Bengio 5353: 5352: 5350: 5346: 5336: 5335:Robot control 5333: 5329: 5326: 5325: 5324: 5321: 5319: 5316: 5314: 5311: 5309: 5306: 5304: 5301: 5299: 5296: 5294: 5291: 5289: 5286: 5285: 5283: 5279: 5273: 5270: 5268: 5265: 5263: 5260: 5258: 5255: 5253: 5252:Chinchilla AI 5250: 5248: 5245: 5243: 5240: 5238: 5235: 5233: 5230: 5228: 5225: 5223: 5220: 5218: 5215: 5213: 5210: 5208: 5205: 5203: 5200: 5198: 5195: 5191: 5188: 5187: 5186: 5183: 5181: 5178: 5176: 5173: 5171: 5168: 5166: 5163: 5161: 5158: 5157: 5155: 5151: 5145: 5142: 5138: 5135: 5133: 5130: 5129: 5128: 5125: 5121: 5118: 5116: 5113: 5111: 5108: 5107: 5106: 5103: 5101: 5098: 5096: 5093: 5091: 5088: 5086: 5083: 5081: 5078: 5076: 5073: 5071: 5068: 5066: 5063: 5061: 5058: 5057: 5055: 5051: 5048: 5044: 5038: 5035: 5033: 5030: 5028: 5025: 5023: 5020: 5018: 5015: 5013: 5010: 5008: 5005: 5004: 5002: 4998: 4992: 4989: 4987: 4984: 4982: 4979: 4977: 4974: 4972: 4969: 4968: 4966: 4962: 4954: 4951: 4950: 4949: 4946: 4944: 4941: 4939: 4936: 4932: 4931:Deep learning 4929: 4928: 4927: 4924: 4920: 4917: 4916: 4915: 4912: 4911: 4909: 4905: 4899: 4896: 4894: 4891: 4887: 4884: 4883: 4882: 4879: 4877: 4874: 4870: 4867: 4865: 4862: 4860: 4857: 4856: 4855: 4852: 4850: 4847: 4845: 4842: 4840: 4837: 4835: 4832: 4830: 4827: 4825: 4822: 4820: 4819:Hallucination 4817: 4813: 4810: 4809: 4808: 4805: 4803: 4800: 4796: 4793: 4792: 4791: 4788: 4787: 4785: 4781: 4775: 4772: 4770: 4767: 4765: 4762: 4760: 4757: 4755: 4752: 4750: 4747: 4745: 4742: 4740: 4737: 4735: 4734: 4730: 4729: 4727: 4725: 4721: 4712: 4707: 4705: 4700: 4698: 4693: 4692: 4689: 4677: 4674: 4672: 4669: 4667: 4666:Hallucination 4664: 4662: 4659: 4658: 4656: 4652: 4646: 4643: 4641: 4638: 4636: 4633: 4630: 4626: 4623: 4621: 4618: 4617: 4615: 4613: 4607: 4601: 4600:Spell checker 4598: 4596: 4593: 4591: 4588: 4586: 4583: 4581: 4578: 4576: 4573: 4572: 4570: 4568: 4562: 4556: 4553: 4551: 4548: 4546: 4543: 4542: 4540: 4538: 4534: 4528: 4525: 4523: 4520: 4518: 4515: 4513: 4510: 4508: 4505: 4504: 4502: 4500: 4494: 4484: 4481: 4479: 4476: 4474: 4471: 4469: 4466: 4464: 4461: 4459: 4456: 4454: 4451: 4449: 4446: 4445: 4443: 4439: 4433: 4430: 4428: 4425: 4423: 4420: 4418: 4415: 4413: 4412:Speech corpus 4410: 4408: 4405: 4403: 4400: 4398: 4395: 4393: 4392:Parallel text 4390: 4388: 4385: 4383: 4380: 4378: 4375: 4373: 4370: 4369: 4367: 4361: 4358: 4353: 4349: 4343: 4340: 4338: 4335: 4333: 4330: 4328: 4325: 4322: 4318: 4315: 4313: 4310: 4308: 4305: 4303: 4300: 4298: 4295: 4293: 4290: 4289: 4287: 4284: 4280: 4274: 4271: 4269: 4266: 4264: 4261: 4259: 4256: 4254: 4253:Example-based 4251: 4249: 4246: 4245: 4243: 4241: 4237: 4231: 4228: 4226: 4223: 4221: 4218: 4217: 4215: 4213: 4209: 4199: 4196: 4194: 4191: 4189: 4186: 4184: 4183:Text chunking 4181: 4179: 4176: 4174: 4173:Lemmatisation 4171: 4169: 4166: 4165: 4163: 4161: 4157: 4151: 4148: 4146: 4143: 4141: 4138: 4136: 4133: 4131: 4128: 4126: 4123: 4122: 4119: 4116: 4114: 4111: 4109: 4106: 4104: 4101: 4099: 4096: 4094: 4091: 4087: 4084: 4082: 4079: 4078: 4077: 4074: 4072: 4069: 4067: 4064: 4062: 4059: 4057: 4054: 4052: 4049: 4047: 4044: 4042: 4039: 4037: 4034: 4032: 4029: 4028: 4026: 4024: 4023:Text analysis 4020: 4014: 4011: 4009: 4006: 4004: 4001: 3999: 3996: 3992: 3989: 3987: 3984: 3983: 3982: 3979: 3977: 3974: 3972: 3969: 3968: 3966: 3964:General terms 3962: 3958: 3951: 3946: 3944: 3939: 3937: 3932: 3931: 3928: 3922: 3919: 3917: 3914: 3912: 3909: 3907: 3904: 3902: 3899: 3897: 3894: 3892: 3889: 3888: 3884: 3879: 3875: 3873: 3872:Wikipedia2Vec 3870: 3869: 3865: 3856: 3852: 3848: 3844: 3840: 3836: 3831: 3826: 3822: 3818: 3811: 3808: 3796: 3790: 3787: 3782: 3769: 3760: 3755: 3748: 3745: 3740: 3733: 3730: 3724: 3719: 3714: 3709: 3705: 3701: 3697: 3690: 3687: 3681: 3676: 3672: 3668: 3664: 3657: 3654: 3649: 3645: 3640: 3635: 3631: 3627: 3623: 3619: 3615: 3608: 3605: 3599: 3594: 3587: 3584: 3579: 3575: 3570: 3565: 3560: 3555: 3551: 3547: 3542: 3537: 3533: 3529: 3525: 3518: 3516: 3512: 3507: 3501: 3497: 3493: 3489: 3485: 3478: 3475: 3470: 3469: 3464: 3457: 3454: 3448: 3443: 3436: 3434: 3430: 3425: 3421: 3416: 3411: 3407: 3403: 3396: 3393: 3388: 3384: 3380: 3373: 3370: 3365: 3358: 3355: 3349: 3344: 3340: 3333: 3331: 3327: 3315: 3308: 3302: 3299: 3287: 3286:Google Groups 3283: 3277: 3274: 3269: 3265: 3261: 3257: 3253: 3249: 3244: 3239: 3235: 3231: 3227: 3220: 3217: 3211: 3206: 3199: 3196: 3192: 3188: 3185: 3183: 3177: 3174: 3169: 3165: 3158: 3156: 3152: 3143: 3139: 3133: 3130: 3125: 3121: 3117: 3110: 3107: 3102: 3097: 3093: 3086: 3083: 3077: 3072: 3065: 3063: 3061: 3057: 3045: 3041: 3035: 3033: 3031: 3027: 3022: 3018: 3013: 3008: 3004: 3000: 2993: 2991: 2987: 2982: 2976: 2967: 2962: 2955: 2953: 2951: 2949: 2947: 2945: 2943: 2941: 2939: 2937: 2935: 2931: 2924: 2919: 2916: 2914: 2911: 2909: 2906: 2904: 2901: 2899: 2896: 2894: 2891: 2889: 2886: 2884: 2881: 2879: 2876: 2874: 2871: 2869: 2866: 2864: 2861: 2859: 2856: 2855: 2850: 2848: 2846: 2842: 2837: 2833: 2829: 2822: 2820: 2813: 2811: 2809: 2803: 2795: 2788: 2786: 2784: 2778: 2776: 2772: 2767: 2759: 2757: 2754: 2749: 2747: 2738: 2736: 2734: 2730: 2726: 2722: 2718: 2714: 2710: 2706: 2698: 2696: 2692: 2690: 2685: 2683: 2679: 2670: 2668: 2664: 2660: 2658: 2654: 2650: 2646: 2642: 2634: 2632: 2626: 2624: 2617: 2615: 2608: 2606: 2599: 2597: 2594: 2590: 2586: 2578: 2576: 2574: 2566: 2564: 2562: 2558: 2554: 2549: 2547: 2542: 2538: 2533: 2531: 2527: 2522: 2520: 2516: 2512: 2511:Tomáš Mikolov 2504: 2502: 2487: 2476: 2471: 2466: 2462: 2457: 2453: 2448: 2442: 2439: 2436: 2432: 2428: 2425: 2422: 2415: 2411: 2406: 2402: 2395: 2391: 2386: 2381: 2375: 2372: 2369: 2366: 2363: 2360: 2357: 2354: 2351: 2347: 2321: 2317: 2306: 2302: 2295: 2292: 2289: 2286: 2281: 2278: 2275: 2272: 2269: 2266: 2263: 2260: 2257: 2253: 2243: 2224: 2220: 2209: 2205: 2198: 2195: 2190: 2187: 2184: 2181: 2178: 2174: 2170: 2162: 2158: 2149: 2146: 2143: 2140: 2137: 2134: 2129: 2125: 2118: 2115: 2090: 2086: 2077: 2074: 2071: 2068: 2065: 2062: 2057: 2053: 2046: 2043: 2038: 2035: 2032: 2028: 2014: 2007: 2005: 1989: 1979: 1975: 1970: 1966: 1961: 1957: 1952: 1946: 1943: 1940: 1936: 1932: 1929: 1926: 1919: 1915: 1910: 1906: 1899: 1895: 1890: 1885: 1879: 1876: 1873: 1870: 1867: 1864: 1861: 1858: 1855: 1851: 1825: 1822: 1817: 1813: 1808: 1802: 1799: 1796: 1792: 1785: 1782: 1777: 1773: 1768: 1762: 1756: 1753: 1750: 1747: 1744: 1741: 1736: 1732: 1723: 1717: 1714: 1690: 1686: 1681: 1675: 1672: 1669: 1666: 1663: 1659: 1655: 1652: 1629: 1626: 1623: 1620: 1617: 1614: 1609: 1605: 1592: 1575: 1572: 1569: 1566: 1563: 1560: 1555: 1551: 1540: 1536: 1529: 1526: 1523: 1520: 1515: 1512: 1509: 1505: 1495: 1478: 1475: 1472: 1469: 1466: 1463: 1458: 1454: 1443: 1439: 1432: 1429: 1424: 1421: 1418: 1414: 1404: 1387: 1384: 1381: 1378: 1375: 1372: 1369: 1366: 1363: 1360: 1357: 1354: 1351: 1348: 1345: 1342: 1339: 1336: 1333: 1330: 1327: 1324: 1321: 1315: 1312: 1299: 1292: 1290: 1288: 1283: 1279: 1265: 1262: 1259: 1237: 1227: 1222: 1218: 1197: 1177: 1168: 1165: 1159: 1157: 1155: 1150: 1146: 1142: 1140: 1136: 1131: 1129: 1125: 1121: 1117: 1113: 1109: 1101: 1099: 1097: 1093: 1089: 1085: 1081: 1076: 1074: 1070: 1069:Tomáš Mikolov 1066: 1062: 1057: 1053: 1049: 1038: 1033: 1031: 1026: 1024: 1019: 1018: 1016: 1015: 1008: 1005: 1001: 998: 997: 996: 993: 991: 988: 987: 981: 980: 973: 970: 968: 965: 963: 960: 958: 955: 953: 950: 948: 945: 943: 940: 939: 933: 932: 925: 922: 920: 917: 915: 912: 910: 907: 905: 902: 900: 897: 895: 892: 890: 887: 886: 880: 879: 872: 869: 867: 864: 862: 859: 857: 854: 853: 847: 846: 839: 836: 834: 831: 829: 828:Crowdsourcing 826: 824: 821: 820: 814: 813: 804: 801: 800: 799: 796: 794: 791: 789: 786: 784: 781: 780: 777: 772: 771: 763: 760: 758: 757:Memtransistor 755: 753: 750: 748: 745: 741: 738: 737: 736: 733: 731: 728: 724: 721: 719: 716: 714: 711: 709: 706: 705: 704: 701: 699: 696: 694: 691: 689: 686: 682: 679: 678: 677: 674: 670: 667: 665: 662: 660: 657: 655: 652: 651: 650: 647: 645: 642: 640: 639:Deep learning 637: 635: 632: 631: 628: 623: 622: 615: 612: 610: 607: 605: 603: 599: 597: 594: 593: 590: 585: 584: 575: 574:Hidden Markov 572: 570: 567: 565: 562: 561: 560: 557: 556: 553: 548: 547: 540: 537: 535: 532: 530: 527: 525: 522: 520: 517: 515: 512: 510: 507: 505: 502: 500: 497: 496: 493: 488: 487: 480: 477: 475: 472: 470: 466: 464: 461: 459: 456: 454: 452: 448: 446: 443: 441: 438: 436: 433: 432: 429: 424: 423: 416: 413: 411: 408: 406: 403: 401: 398: 396: 393: 391: 388: 386: 383: 381: 379: 375: 371: 370:Random forest 368: 366: 363: 361: 358: 357: 356: 353: 351: 348: 346: 343: 342: 335: 334: 329: 328: 320: 314: 313: 306: 303: 301: 298: 296: 293: 291: 288: 286: 283: 281: 278: 276: 273: 271: 268: 266: 263: 261: 258: 256: 255:Data cleaning 253: 251: 248: 246: 243: 241: 238: 236: 233: 231: 228: 226: 223: 221: 218: 217: 211: 210: 203: 200: 198: 195: 193: 190: 188: 185: 183: 180: 178: 175: 173: 170: 168: 167:Meta-learning 165: 163: 160: 158: 155: 153: 150: 148: 145: 143: 140: 139: 133: 132: 129: 124: 120: 116: 115: 108: 105: 103: 99: 93: 90: 88: 85: 84: 82: 80: 76: 73: 70: 68: 64: 50: 46: 43: 40: 38: 34: 19: 5610: 5441:Hugging Face 5405:David Silver 5159: 5053:Audio–visual 4907:Applications 4886:Augmentation 4731: 4580:Concordancer 4341: 3976:Bag-of-words 3878:introduction 3820: 3816: 3810: 3798:. Retrieved 3789: 3768:cite journal 3747: 3738: 3732: 3703: 3699: 3689: 3670: 3666: 3656: 3621: 3617: 3607: 3586: 3531: 3527: 3487: 3477: 3466: 3456: 3405: 3395: 3386: 3382: 3372: 3357: 3338: 3317:. Retrieved 3313: 3301: 3289:. Retrieved 3285: 3276: 3233: 3229: 3219: 3198: 3181: 3176: 3132: 3115: 3109: 3091: 3085: 3047:. Retrieved 3043: 2998: 2838: 2834: 2830: 2826: 2817: 2804: 2800: 2779: 2763: 2750: 2742: 2702: 2693: 2686: 2674: 2665: 2661: 2640: 2638: 2630: 2621: 2612: 2603: 2600:Sub-sampling 2589:Huffman tree 2582: 2570: 2550: 2534: 2523: 2508: 2244: 2019: 1593: 1496: 1405: 1304: 1284: 1280: 1169: 1166: 1163: 1154:semantically 1151: 1147: 1143: 1139:bag of words 1132: 1120:vector space 1105: 1095: 1091: 1077: 1047: 1046: 914:PAC learning 601: 450: 445:Hierarchical 377: 331: 325: 5589:Categories 5537:Autoencoder 5492:Transformer 5360:Alex Graves 5308:OpenAI Five 5212:IBM Watsonx 4834:Convolution 4812:Overfitting 4537:Topic model 4417:Text corpus 4263:Statistical 4130:Text mining 3971:AI-complete 3823:: 178–187. 3706:: 385–399. 3142:Google Inc. 2863:Autoencoder 2553:Transformer 798:Multi-agent 735:Transformer 634:Autoencoder 390:Naive Bayes 128:data mining 5617:Categories 5578:Technology 5431:EleutherAI 5390:Fei-Fei Li 5385:Yann LeCun 5298:Q-learning 5281:Decisional 5207:IBM Watson 5115:Midjourney 5007:TensorFlow 4854:Activation 4807:Regression 4802:Clustering 4258:Rule-based 4140:Truecasing 4008:Stop words 3916:Java/Scala 3830:1610.01520 3759:1705.03127 3741:: 746–751. 3713:1502.03520 3598:1701.06279 3541:1503.05140 3447:2008.09470 3415:1609.06616 3243:2109.04738 3210:1607.01759 3138:US 9037464 2975:cite arXiv 2925:References 2705:biological 2699:BioVectors 2627:Extensions 1124:dimensions 1065:synonymous 783:Q-learning 681:Restricted 479:Mean shift 428:Clustering 405:Perceptron 333:regression 235:Clustering 230:Regression 107:Apache-2.0 67:Repository 5461:MIT CSAIL 5426:Anthropic 5395:Andrew Ng 5293:AlphaZero 5137:VideoPoet 5100:AlphaFold 5037:MindSpore 4991:SpiNNaker 4986:Memristor 4893:Diffusion 4869:Rectifier 4849:Batchnorm 4829:Attention 4824:Adversary 4567:reviewing 4365:standards 4363:Types and 3855:195347873 3739:HLT-Naacl 3624:: 11–20. 3463:"Top2Vec" 3348:1405.4053 3268:237485425 3260:1939-3520 3101:1411.2738 3076:1402.3722 3012:1310.4546 2966:1301.3781 2513:(then at 2509:In 2010, 2463:⋅ 2440:∈ 2433:∑ 2429:⁡ 2423:− 2403:⋅ 2367:∈ 2355:∈ 2348:∑ 2290:⁡ 2273:∈ 2261:∈ 2254:∑ 2182:∈ 2175:∏ 2141:∈ 2069:∈ 2036:∈ 2029:∏ 2016:Skip-gram 2008:Skip-gram 1967:⋅ 1944:∈ 1937:∑ 1933:⁡ 1927:− 1907:⋅ 1871:∈ 1859:∈ 1852:∑ 1823:⋅ 1800:∈ 1793:∑ 1783:⋅ 1748:∈ 1667:∈ 1660:∑ 1621:∈ 1567:∈ 1524:⁡ 1513:∈ 1506:∑ 1470:∈ 1422:∈ 1415:∏ 1349:− 1340:− 1331:− 1322:− 1263:∈ 1228:∈ 942:ECML PKDD 924:VC theory 871:ROC curve 803:Self-play 723:DeepDream 564:Bayes net 355:Ensembles 136:Paradigms 42:Google AI 5569:Portals 5328:Auto-GPT 5160:Word2vec 4964:Hardware 4881:Datasets 4783:Concepts 4483:Wikidata 4463:FrameNet 4448:BabelNet 4427:Treebank 4397:PropBank 4342:Word2vec 4307:fastText 4188:Stemming 3847:28943127 3648:29175548 3578:26555596 3528:PLOS ONE 3364:"Gensim" 3319:18 March 3187:Archived 3168:Archived 2898:fastText 2858:Semantle 2851:See also 2760:Analysis 2717:proteins 2546:fastText 1102:Approach 1048:Word2vec 365:Boosting 214:Problems 31:word2vec 18:Word2Vec 5451:Meta AI 5288:AlphaGo 5272:PanGu-Σ 5242:ChatGPT 5217:Granite 5165:Seq2seq 5144:Whisper 5065:WaveNet 5060:AlexNet 5032:Flux.jl 5012:PyTorch 4864:Sigmoid 4859:Softmax 4724:General 4654:Related 4620:Chatbot 4478:WordNet 4458:DBpedia 4332:Seq2seq 4076:Parsing 3991:Trigram 3800:10 June 3639:5771955 3569:4640716 3546:Bibcode 3424:3087278 3291:13 June 3049:13 June 3017:Bibcode 2678:reduces 2671:top2vec 2635:doc2vec 2585:softmax 2505:History 1287:softmax 947:NeurIPS 764:(ECRAM) 718:AlexNet 360:Bagging 102:License 56: ( 5466:Huawei 5446:OpenAI 5348:People 5318:MuZero 5180:Gemini 5175:Claude 5110:DALL-E 5022:Theano 4627:(c.f. 4285:models 4273:Neural 3986:Bigram 3981:n-gram 3853:  3845:  3646:  3636:  3576:  3566:  3502:  3468:GitHub 3422:  3266:  3258:  3147:  2725:BioVec 2719:) for 2715:, and 2649:Python 2526:Google 1128:corpus 1080:vector 1073:Google 1061:corpus 1056:vector 740:Vision 596:RANSAC 474:OPTICS 469:DBSCAN 453:-means 260:AutoML 5532:Mamba 5303:SARSA 5267:LLaMA 5262:BLOOM 5247:GPT-J 5237:GPT-4 5232:GPT-3 5227:GPT-2 5222:GPT-1 5185:LaMDA 5017:Keras 4676:spaCy 4321:large 4312:GloVe 3851:S2CID 3825:arXiv 3754:arXiv 3708:arXiv 3593:arXiv 3536:arXiv 3442:arXiv 3410:arXiv 3343:arXiv 3310:(PDF) 3264:S2CID 3238:arXiv 3205:arXiv 3184:(pdf) 3096:arXiv 3071:arXiv 3007:arXiv 2961:arXiv 2903:GloVe 2657:Scala 2541:GloVe 962:IJCAI 788:SARSA 747:Mamba 713:LeNet 708:U-Net 534:t-SNE 458:Fuzzy 435:BIRCH 5456:Mila 5257:PaLM 5190:Bard 5170:BERT 5153:Text 5132:Sora 4441:Data 4292:BERT 3843:PMID 3802:2016 3781:help 3644:PMID 3574:PMID 3500:ISBN 3420:SSRN 3406:SSRN 3389:(1). 3321:2017 3293:2016 3256:ISSN 3051:2016 2981:link 2908:ELMo 2682:UMAP 2653:Java 2651:and 2561:BERT 2559:and 2557:ELMo 1170:Let 1094:and 1092:walk 972:JMLR 957:ICLR 952:ICML 838:RLHF 654:LSTM 440:CURE 126:and 79:Type 5197:NMT 5080:OCR 5075:HWR 5027:JAX 4981:VPU 4976:TPU 4971:IPU 4795:SGD 4473:UBY 3835:doi 3718:doi 3675:doi 3634:PMC 3626:doi 3564:PMC 3554:doi 3492:doi 3248:doi 3120:doi 2713:RNA 2709:DNA 2689:LDA 1096:ran 698:SOM 688:GAN 664:ESN 659:GRU 604:-NN 539:SDL 529:PGD 524:PCA 519:NMF 514:LDA 509:ICA 504:CCA 380:-NN 5619:: 3896:C# 3849:. 3841:. 3833:. 3821:56 3819:. 3772:: 3770:}} 3766:{{ 3716:. 3702:. 3698:. 3669:. 3665:. 3642:. 3632:. 3622:77 3620:. 3616:. 3572:. 3562:. 3552:. 3544:. 3532:10 3530:. 3526:. 3514:^ 3498:. 3486:. 3465:. 3432:^ 3418:. 3408:. 3404:. 3387:28 3385:. 3381:. 3341:. 3329:^ 3312:. 3284:. 3262:. 3254:. 3246:. 3234:49 3232:. 3228:. 3166:. 3154:^ 3094:, 3059:^ 3042:. 3029:^ 3015:. 3005:. 3001:. 2989:^ 2977:}} 2973:{{ 2933:^ 2711:, 2647:, 2539:. 2426:ln 2287:ln 2242:. 1930:ln 1763::= 1656::= 1521:ln 1403:. 1278:. 967:ML 4710:e 4703:t 4696:v 4631:) 4354:, 4323:) 4319:( 3949:e 3942:t 3935:v 3921:R 3891:C 3880:) 3876:( 3857:. 3837:: 3827:: 3804:. 3783:) 3779:( 3762:. 3756:: 3720:: 3710:: 3704:4 3683:. 3677:: 3671:3 3650:. 3628:: 3601:. 3595:: 3580:. 3556:: 3548:: 3538:: 3508:. 3494:: 3471:. 3450:. 3444:: 3426:. 3412:: 3366:. 3351:. 3345:: 3323:. 3295:. 3270:. 3250:: 3240:: 3213:. 3207:: 3126:. 3122:: 3098:: 3079:. 3073:: 3053:. 3023:. 3019:: 3009:: 2983:) 2969:. 2963:: 2655:/ 2645:C 2488:) 2477:i 2472:w 2467:v 2458:w 2454:v 2449:e 2443:V 2437:w 2416:j 2412:w 2407:v 2396:i 2392:w 2387:v 2382:( 2376:i 2373:+ 2370:N 2364:j 2361:, 2358:C 2352:i 2327:) 2322:i 2318:w 2313:| 2307:j 2303:w 2299:( 2296:r 2293:P 2282:i 2279:+ 2276:N 2270:j 2267:, 2264:C 2258:i 2230:) 2225:i 2221:w 2216:| 2210:j 2206:w 2202:( 2199:r 2196:P 2191:i 2188:+ 2185:N 2179:j 2171:= 2168:) 2163:i 2159:w 2154:| 2150:i 2147:+ 2144:N 2138:j 2135:: 2130:j 2126:w 2122:( 2119:r 2116:P 2096:) 2091:i 2087:w 2082:| 2078:i 2075:+ 2072:N 2066:j 2063:: 2058:j 2054:w 2050:( 2047:r 2044:P 2039:C 2033:i 1990:) 1980:j 1976:w 1971:v 1962:w 1958:v 1953:e 1947:V 1941:w 1920:j 1916:w 1911:v 1900:i 1896:w 1891:v 1886:( 1880:i 1877:+ 1874:N 1868:j 1865:, 1862:C 1856:i 1826:v 1818:w 1814:v 1809:e 1803:V 1797:w 1786:v 1778:w 1774:v 1769:e 1760:) 1757:i 1754:+ 1751:N 1745:j 1742:: 1737:j 1733:w 1728:| 1724:w 1721:( 1718:r 1715:P 1691:j 1687:w 1682:v 1676:i 1673:+ 1670:N 1664:j 1653:v 1633:} 1630:i 1627:+ 1624:N 1618:j 1615:: 1610:j 1606:w 1602:{ 1579:) 1576:i 1573:+ 1570:N 1564:j 1561:: 1556:j 1552:w 1547:| 1541:i 1537:w 1533:( 1530:r 1527:P 1516:C 1510:i 1482:) 1479:i 1476:+ 1473:N 1467:j 1464:: 1459:j 1455:w 1450:| 1444:i 1440:w 1436:( 1433:r 1430:P 1425:C 1419:i 1391:} 1388:4 1385:+ 1382:, 1379:3 1376:+ 1373:, 1370:2 1367:+ 1364:, 1361:1 1358:+ 1355:, 1352:1 1346:, 1343:2 1337:, 1334:3 1328:, 1325:4 1319:{ 1316:= 1313:N 1266:V 1260:w 1238:n 1233:R 1223:w 1219:v 1198:C 1178:V 1036:e 1029:t 1022:v 602:k 451:k 378:k 336:) 324:( 60:) 20:)

Index

Word2Vec
Original author(s)
Google AI
Repository
https://code.google.com/archive/p/word2vec/
Type
Language model
Word embedding
License
Apache-2.0
Machine learning
data mining
Supervised learning
Unsupervised learning
Semi-supervised learning
Self-supervised learning
Reinforcement learning
Meta-learning
Online learning
Batch learning
Curriculum learning
Rule-based learning
Neuro-symbolic AI
Neuromorphic engineering
Quantum machine learning
Classification
Generative modeling
Regression
Clustering
Dimensionality reduction

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.