1462:. The quality of a paraphrase depends on its context, whether it is being used as a summary, and how it is generated, among other factors. Additionally, a good paraphrase usually is lexically dissimilar from its source phrase. The simplest method used to evaluate paraphrase generation would be through the use of human judges. Unfortunately, evaluation through human judges tends to be time-consuming. Automated approaches to evaluation prove to be challenging as it is essentially a problem as difficult as paraphrase recognition. While originally used to evaluate machine translations, bilingual evaluation understudy (
1385:. Skip-thought vectors are produced through the use of a skip-thought model which consists of three key components, an encoder and two decoders. Given a corpus of documents, the skip-thought model is trained to take a sentence as input and encode it into a skip-thought vector. The skip-thought vector is used as input for both decoders; one attempts to reproduce the previous sentence and the other the following sentence in its entirety. The encoder and decoder can be implemented through the use of a
918:
methods. Autoencoder models predict word replacement candidates with a one-hot distribution over the vocabulary, while autoregressive and seq2seq models generate new text based on the source predicting one word at a time. More advanced efforts also exist to make paraphrasing controllable according to
177:
to produce potential paraphrases in the original language. For example, the phrase "under control" in an
English sentence is aligned with the phrase "unter kontrolle" in its German counterpart. The phrase "unter kontrolle" is then found in another German sentence with the aligned English phrase being
1449:
do relatively well. However, there is difficulty calculating f1-scores due to trouble producing a complete list of paraphrases for a given phrase and the fact that good paraphrases are dependent upon context. A metric designed to counter these problems is ParaMetric. ParaMetric aims to calculate the
877:
encoding of all the words in a sentence as input and produces a final hidden vector, which can represent the input sentence. The decoding LSTM takes the hidden vector as input and generates a new sentence, terminating in an end-of-sentence token. The encoder and decoder are trained to take a phrase
669:
160:
overlap. Recurring patterns are found within clusters by using multi-sequence alignment. Then the position of argument words is determined by finding areas of high variability within each cluster, aka between words shared by more than 50% of a cluster's sentences. Pairings between patterns are then
1081:
vectors. The autoencoder is then applied recursively with the new vectors as inputs until a single vector is produced. Given an odd number of inputs, the first vector is forwarded as-is to the next level of recursion. The autoencoder is trained to reproduce every vector in the full recursion tree,
1454:
of an automatic paraphrase system by comparing the automatic alignment of paraphrases to a manual alignment of similar phrases. Since ParaMetric is simply rating the quality of phrase alignment, it can be used to rate paraphrase generation systems, assuming it uses phrase alignment as part of its
1469:
Metrics specifically designed to evaluate paraphrase generation include paraphrase in n-gram change (PINC) and paraphrase evaluation metric (PEM) along with the aforementioned ParaMetric. PINC is designed to be used with BLEU and help cover its inadequacies. Since BLEU has difficulty measuring
1483:
The Quora
Question Pairs Dataset, which contains hundreds of thousands of duplicate questions, has become a common dataset for the evaluation of paraphrase detectors. Consistently reliable paraphrase detection have all used the Transformer architecture and all have relied on large amounts of
1474:
between the sentence, excluding n-grams that appear in the source sentence to maintain some semantic equivalence. PEM, on the other hand, attempts to evaluate the "adequacy, fluency, and lexical dissimilarity" of paraphrases by returning a single value heuristic calculated using
936:. The main concept is to produce a vector representation of a sentence and its components by recursively using an autoencoder. The vector representations of paraphrases should have similar vector representations; they are processed, then fed as input into a
161:
found by comparing similar variable words between different corpora. Finally, new paraphrases can be generated by choosing a matching cluster for a source sentence, then substituting the source sentence's argument into any number of patterns in the cluster.
1479:
overlap in a pivot language. However, a large drawback to PEM is that it must be trained using large, in-domain parallel corpora and human judges. It is equivalent to training a paraphrase recognition to evaluate a paraphrase generation system.
919:
predefined quality dimensions, such as semantic preservation or lexical diversity. Many
Transformer-based paraphrase generation methods rely on unsupervised learning to leverage large amounts of training data and scale their methods.
446:
1466:) has been used successfully to evaluate paraphrase generation models as well. However, paraphrases often have several lexically different but equally valid solutions, hurting BLEU and other similar evaluation metrics.
1416:
layer and trained end-to-end on identification tasks. Transformers achieve strong results when transferring between domains and paraphrasing techniques compared to more traditional machine learning methods such as
1236:
906:. These models are so fluent in generating text that human experts cannot identify if an example was human-authored or machine-generated. Transformer-based paraphrase generation relies on
2169:
Proceedings of the 59th Annual
Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
2132:
Proceedings of the 59th Annual
Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
1526:
355:
1079:
1300:
230:
438:
759:
715:
1455:
generation process. A notable drawback to ParaMetric is the large and exhaustive set of manual alignments that must be initially created before a rating can be produced.
1437:
Multiple methods can be used to evaluate paraphrases. Since paraphrase recognition can be posed as a classification problem, most standard evaluations metrics such as
1367:
1195:
1168:
1137:
1110:
854:
826:
402:
284:
257:
1617:
1340:
1320:
1256:
1045:
1025:
1001:
981:
961:
799:
779:
375:
664:{\displaystyle {\hat {e_{2}}}={\text{arg}}\max _{e_{2}\neq e_{1}}\Pr(e_{2}|e_{1},S)={\text{arg}}\max _{e_{2}\neq e_{1}}\sum _{f}\Pr(e_{2}|f,S)\Pr(f|e_{1},S)}
2244:. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Portland, Oregon. pp. 190β200.
869:(LSTM) models to generate paraphrases. In short, the model consists of an encoder and decoder component, both implemented using variations of a stacked
1470:
lexical dissimilarity, PINC is a measurement of the lack of n-gram overlap between a source sentence and a candidate paraphrase. It is essentially the
1584:
Wahle, Jan Philip; Ruas, Terry; Kirstein, Frederic; Gipp, Bela (2022). "How Large
Language Models are Transforming Machine-Paraphrase Plagiarism".
1412:
influenced paraphrase generation, their application in identifying paraphrases showed great success. Models such as BERT can be adapted with a
1931:
Proceedings of the 2022 Conference of the North
American Chapter of the Association for Computational Linguistics: Human Language Technologies
2295:- a dataset consisting of 5800 pairs of sentences extracted from news articles annotated to note whether a pair captures semantic equivalence
2097:
1862:
1534:
1369:
roughly even sections. The output is then normalized to have mean 0 and standard deviation 1 and is fed into a fully connected layer with a
1409:
895:
1139:
of length 4 and 3 respectively, the autoencoders would produce 7 and 5 vector representations including the initial word embeddings. The
2314:
1446:
1396:
Since paraphrases carry the same semantic meaning between one another, they should have similar skip-thought vectors. Thus a simple
69:
1560:. EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing. Honolulu, Hawaii. pp. 196β205.
1574:." Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vol. 1. 2014.
1426:
1400:
can be trained to good performance with the absolute difference and component-wise product of two skip-thought vectors as input.
2261:. Proceedings of the 2010 Conference on Empricial Methods in Natural Language Processing. MIT, Massachusetts. pp. 923β932.
170:
1381:
Skip-thought vectors are an attempt to create a vector representation of the semantic meaning of a sentence, similarly to the
1999:
Kiros, Ryan; Zhu, Yukun; Salakhutdinov, Ruslan; Zemel, Richard; Torralba, Antonio; Urtasun, Raquel; Fidler, Sanja (2015),
1422:
883:
102:
77:
49:
31:
1200:
903:
870:
2319:
1668:
Prakash, Aaditya; Hasan, Sadid A.; Lee, Kathy; Datla, Vivek; Qadir, Ashequl; Liu, Joey; Farri, Oladimeji (2016),
937:
45:
1886:
Bandel, Elron; Aharonov, Ranit; Shmueli-Scheuer, Michal; Shnayderman, Ilya; Slonim, Noam; Ein-Dor, Liat (2022).
886:. New paraphrases are generated by inputting a new phrase to the encoder and passing the output to the decoder.
1892:
Proceedings of the 60th Annual
Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
1743:
Proceedings of the 60th Annual
Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
1386:
2272:
2204:. Proceedings of the 22nd International Conference on Computational Linguistics. Manchester. pp. 97β104.
289:
1970:. Online and Punta Cana, Dominican Republic: Association for Computational Linguistics. pp. 5136β5150.
1708:. Online and Punta Cana, Dominican Republic: Association for Computational Linguistics. pp. 5075β5086.
61:
2066:
Wahle, Jan Philip; Ruas, Terry; FoltΓ½nek, TomΓ‘Ε‘; Meuschke, Norman; Gipp, Bela (2022), Smits, Malte (ed.),
1493:
1390:
866:
1050:
1413:
1739:"Is GPT-3 Text Indistinguishable from Human Text? Scarecrow: A Framework for Scrutinizing Machine Text"
1265:
184:
2014:
1683:
1451:
911:
65:
407:
1498:
1459:
1418:
1397:
1962:
Niu, Tong; Yavuz, Semih; Zhou, Yingbo; Keskar, Nitish
Shirish; Wang, Huan; Xiong, Caiming (2021).
720:
676:
2292:
2215:
2182:
2145:
2103:
2075:
2048:
2004:
1981:
1944:
1895:
1868:
1840:
1813:
1793:
1766:
1746:
1719:
1673:
1611:
1589:
1503:
1140:
57:
1635:
173:
as proposed by Bannard and Callison-Burch. The chief concept consists of aligning phrases in a
30:
This article is about automated generation and recognition of paraphrases. For other uses, see
2093:
1858:
2205:
2172:
2135:
2085:
2040:
1971:
1934:
1905:
1850:
1803:
1756:
1709:
1599:
1525:
Socher, Richard; Huang, Eric; Pennington, Jeffrey; Ng, Andrew; Manning, Christopher (2011),
1370:
97:
Barzilay and Lee proposed a method to generate paraphrases through the usage of monolingual
73:
1657:. Proceedings of the 43rd Annual Meeting of the ACL. Ann Arbor, Michigan. pp. 597β604.
1345:
1173:
1146:
1115:
1088:
832:
804:
380:
262:
235:
101:, namely news articles covering the same event on the same day. Training consists of using
1833:"Are Neural Language Models Good Plagiarists? A Benchmark for Neural Paraphrase Detection"
1373:
output. The dynamic pooling to softmax model is trained using pairs of known paraphrases.
2067:
1933:. Seattle, United States: Association for Computational Linguistics. pp. 3254β3263.
1027:-dimensional vector as output. The same autoencoder is applied to every pair of words in
2018:
1687:
2165:"ProtAugment: Intent Detection Meta-Learning through Unsupervised Diverse Paraphrasing"
1790:
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
1325:
1305:
1241:
1030:
1010:
1004:
986:
966:
946:
932:
Paraphrase recognition has been attempted by Socher et al through the use of recursive
899:
784:
764:
360:
174:
1976:
1968:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
1939:
1714:
1706:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
1586:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
1458:
The evaluation of paraphrase generation has similar difficulties as the evaluation of
898:, paraphrase generation approaches improved their ability to generate text by scaling
17:
2308:
2186:
2149:
2107:
1985:
1948:
1872:
1817:
1770:
1723:
1603:
1471:
1259:
98:
2301:- A searchable database containing millions of paraphrases in 16 different languages
2177:
2140:
2052:
1785:
1761:
1737:
Dou, Yao; Forbes, Maxwell; Koncel-Kedziorski, Rik; Smith, Noah; Choi, Yejin (2022).
1910:
1808:
878:
and reproduce the one-hot distribution of a corresponding paraphrase by minimizing
404:
is added as a prior to add context to the paraphrase. Thus the optimal paraphrase,
377:, a potential phrase translation in the pivot language. Additionally, the sentence
105:
to generate sentence-level paraphrases from an unannotated corpus. This is done by
2219:
1637:
Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment
2089:
1854:
1784:
Liu, Xianggen; Mou, Lili; Meng, Fandong; Zhou, Hao; Zhou, Jie; Song, Sen (2020).
2039:. Minneapolis, Minnesota: Association for Computational Linguistics: 4171β4186.
1484:
pre-training with more general data before fine-tuning with the question pairs.
933:
907:
81:
1963:
1926:
1832:
1701:
1421:. Other successful methods based on the Transformer architecture include using
2074:, vol. 13192, Cham: Springer International Publishing, pp. 393β413,
879:
53:
2239:
2164:
2127:
1738:
1652:
1555:
1531:
Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection
2210:
1887:
2256:
2044:
1571:
56:. Applications of paraphrasing are varied including information retrieval,
2171:. Online: Association for Computational Linguistics. pp. 2454β2466.
2134:. Online: Association for Computational Linguistics. pp. 7106β7116.
2031:
Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (2019).
1745:. Dublin, Ireland: Association for Computational Linguistics: 7250β7274.
1442:
1438:
1382:
128:
finding pairings between such patterns the represent paraphrases, i.e. "
2128:"Improving Paraphrase Detection with the Adversarial Paraphrasing Task"
2032:
1927:"Unsupervised Paraphrasability Prediction for Compound Nominalizations"
1894:. Dublin, Ireland: Association for Computational Linguistics: 596β609.
915:
874:
156:
This is achieved by first clustering similar sentences together using
1831:
Wahle, Jan Philip; Ruas, Terry; Meuschke, Norman; Gipp, Bela (2021).
1476:
157:
1557:
Syntactic Constraints on Paraphrases Extracted from Parallel Corpora
781:
as a prior is modeled by calculating the probability of forming the
2080:
2009:
1900:
1845:
1798:
1751:
1678:
1594:
2163:
Dopierre, Thomas; Gravier, Christophe; Logerais, Wilfried (2021).
1588:. Online and Abu Dhabi, United Arab Emirates. pp. 952β963.
1463:
1925:
Lee, John Sie Yuen; Lim, Ho Hung; Carol Webster, Carol (2022).
1670:
Neural Paraphrase Generation with Staked Residual LSTM Networks
761:
can be approximated by simply taking their frequencies. Adding
1792:. Online: Association for Computational Linguistics: 302β312.
2258:
PEM: A Paraphrase Evaluation Metric Exploiting Parallel Texts
2200:
Callison-Burch, Chris; Cohn, Trevor; Lapata, Mirella (2008).
109:
finding recurring patterns in each individual corpus, i.e. "
2298:
2202:
ParaMetric: An Automatic Evaluation Metric for Paraphrasing
1964:"Unsupervised Paraphrasing with Pretrained Language Models"
1837:
2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)
2241:
Collecting Highly Parallel Data for Paraphrase Evaluation
2072:
Information for a Better World: Shaping the Global Future
1702:"Paraphrase Generation: A Survey of the State of the Art"
1322:
are not uniform in size among all potential sentences,
27:
Automatic generation or recognition of paraphrased text
1527:"Advances in Neural Information Processing Systems 24"
1143:
is then taken between every combination of vectors in
902:
parameters and heavily parallelizing training through
1348:
1328:
1308:
1268:
1244:
1203:
1176:
1149:
1118:
1091:
1053:
1033:
1013:
989:
969:
949:
835:
807:
787:
767:
723:
679:
449:
410:
383:
363:
292:
265:
238:
187:
2255:
Liu, Chang; Dahlmeier, Daniel; Ng, Hwee Tou (2010).
169:
Paraphrase can also be generated through the use of
2273:"Paraphrase Identification on Quora Question Pairs"
1786:"Unsupervised Paraphrasing by Simulated Annealing"
1361:
1334:
1314:
1294:
1250:
1230:
1189:
1162:
1131:
1104:
1073:
1039:
1019:
995:
975:
955:
848:
820:
793:
773:
753:
709:
663:
432:
396:
369:
349:
278:
251:
224:
2033:"Proceedings of the 2019 Conference of the North"
1634:Barzilay, Regina; Lee, Lillian (MayβJune 2003).
724:
680:
628:
595:
556:
507:
478:
320:
293:
188:
2037:Proceedings of the 2019 Conference of the North
181:The probability distribution can be modeled as
1839:. Champaign, IL, USA: IEEE. pp. 226β229.
1651:Bannard, Colin; Callison-Burch, Chris (2005).
1554:Callison-Burch, Chris (October 25β27, 2008).
1231:{\displaystyle S\in \mathbb {R} ^{7\times 5}}
983:words, the autoencoder is designed to take 2
178:"in check," a paraphrase of "under control."
8:
2068:"Identifying Machine-Paraphrased Plagiarism"
1068:
1054:
1888:"Quality Controlled Paraphrase Generation"
1629:
1627:
1616:: CS1 maint: location missing publisher (
2209:
2176:
2139:
2126:Nighojkar, Animesh; Licato, John (2021).
2079:
2008:
1975:
1938:
1909:
1899:
1844:
1807:
1797:
1760:
1750:
1713:
1677:
1593:
1353:
1347:
1327:
1307:
1286:
1273:
1267:
1243:
1216:
1212:
1211:
1202:
1181:
1175:
1154:
1148:
1123:
1117:
1096:
1090:
1060:
1052:
1032:
1012:
988:
968:
948:
840:
834:
812:
806:
786:
766:
742:
733:
722:
696:
690:
678:
646:
637:
611:
605:
589:
577:
564:
559:
550:
532:
523:
517:
499:
486:
481:
472:
457:
451:
450:
448:
418:
412:
411:
409:
388:
382:
362:
338:
329:
309:
303:
291:
270:
264:
243:
237:
213:
204:
198:
186:
1520:
1518:
350:{\displaystyle \Pr(e_{2}|f)\Pr(f|e_{1})}
2233:
2231:
2229:
1654:Paraphrasing Bilingual Parallel Corpora
1514:
1082:including the initial word embeddings.
873:LSTM. First, the encoding LSTM takes a
1609:
68:. Paraphrasing is also useful in the
7:
2293:Microsoft Research Paraphrase Corpus
2238:Chen, David; Dolan, William (2008).
1570:Berant, Jonathan, and Percy Liang. "
1074:{\displaystyle \lfloor m/2\rfloor }
1700:Zhou, Jianing; Bhat, Suma (2021).
80:of new samples to expand existing
25:
1572:Semantic parsing via paraphrasing
1295:{\displaystyle n_{p}\times n_{p}}
1258:is then subject to a dynamic min-
70:evaluation of machine translation
52:task of detecting and generating
1640:. Proceedings of HLT-NAACL 2003.
865:There has been success in using
225:{\displaystyle \Pr(e_{2}|e_{1})}
165:Phrase-based Machine Translation
1977:10.18653/v1/2021.emnlp-main.417
1940:10.18653/v1/2022.naacl-main.237
1715:10.18653/v1/2021.emnlp-main.414
1197:to produce a similarity matrix
1604:10.18653/v1/2022.emnlp-main.62
748:
734:
727:
704:
697:
683:
658:
638:
631:
625:
612:
598:
544:
524:
510:
463:
433:{\displaystyle {\hat {e_{2}}}}
424:
344:
330:
323:
317:
310:
296:
219:
205:
191:
1:
2178:10.18653/v1/2021.acl-long.191
2141:10.18653/v1/2021.acl-long.552
1762:10.18653/v1/2022.acl-long.501
2090:10.1007/978-3-030-96957-8_34
1911:10.18653/v1/2022.acl-long.45
1855:10.1109/JCDL52503.2021.00065
1809:10.18653/v1/2020.acl-main.28
754:{\displaystyle \Pr(f|e_{1})}
710:{\displaystyle \Pr(e_{2}|f)}
884:stochastic gradient descent
93:Multiple sequence alignment
50:natural language processing
32:Paraphrase (disambiguation)
2336:
2299:Paraphrase Database (PPDB)
152:were in serious condition"
29:
2315:Computational linguistics
894:With the introduction of
286:, which is equivalent to
232:, the probability phrase
46:computational linguistics
1387:recursive neural network
1262:to produce a fixed size
1007:as input and produce an
171:phrase-based translation
103:multi-sequence alignment
2211:10.3115/1599081.1599094
144:were (wounded/hurt) by
1494:Round-trip translation
1363:
1336:
1316:
1296:
1252:
1232:
1191:
1164:
1133:
1106:
1075:
1041:
1021:
997:
977:
957:
928:Recursive Autoencoders
923:Paraphrase recognition
867:long short-term memory
861:Long short-term memory
850:
822:
795:
775:
755:
711:
665:
434:
398:
371:
351:
280:
253:
226:
18:Automated paraphrasing
1414:binary classification
1364:
1362:{\displaystyle n_{p}}
1337:
1317:
1297:
1253:
1233:
1192:
1190:{\displaystyle W_{2}}
1165:
1163:{\displaystyle W_{1}}
1134:
1132:{\displaystyle W_{2}}
1107:
1105:{\displaystyle W_{1}}
1076:
1042:
1022:
998:
978:
958:
851:
849:{\displaystyle e_{2}}
823:
821:{\displaystyle e_{1}}
796:
776:
756:
712:
666:
435:
399:
397:{\displaystyle e_{1}}
372:
352:
281:
279:{\displaystyle e_{1}}
254:
252:{\displaystyle e_{2}}
227:
88:Paraphrase generation
2045:10.18653/v1/N19-1423
2001:Skip-Thought Vectors
1452:precision and recall
1423:adversarial learning
1377:Skip-thought vectors
1346:
1326:
1306:
1266:
1242:
1201:
1174:
1147:
1116:
1089:
1085:Given two sentences
1051:
1031:
1011:
987:
967:
947:
940:for classification.
916:sequence-to-sequence
833:
828:is substituted with
805:
785:
765:
721:
677:
447:
408:
381:
361:
290:
263:
236:
185:
66:plagiarism detection
2019:2015arXiv150606726K
1688:2016arXiv161003098P
1499:Text simplification
1460:machine translation
1419:logistic regression
1398:logistic regression
904:feed-forward layers
440:can be modeled as:
259:is a paraphrase of
1504:Text normalization
1410:Transformer models
1359:
1332:
1312:
1292:
1248:
1228:
1187:
1160:
1141:euclidean distance
1129:
1102:
1071:
1037:
1017:
993:
973:
953:
896:Transformer models
846:
818:
791:
771:
751:
707:
661:
594:
584:
506:
430:
394:
367:
347:
276:
249:
222:
132:(injured/wounded)
113:(injured/wounded)
62:text summarization
58:question answering
2099:978-3-030-96956-1
1864:978-1-6654-1770-9
1335:{\displaystyle S}
1315:{\displaystyle S}
1251:{\displaystyle S}
1040:{\displaystyle S}
1020:{\displaystyle n}
996:{\displaystyle n}
976:{\displaystyle m}
956:{\displaystyle W}
943:Given a sentence
794:{\displaystyle S}
774:{\displaystyle S}
585:
555:
553:
477:
475:
466:
427:
370:{\displaystyle f}
121:seriously" where
16:(Redirected from
2327:
2320:Machine learning
2281:
2280:
2277:Papers with Code
2269:
2263:
2262:
2252:
2246:
2245:
2235:
2224:
2223:
2213:
2197:
2191:
2190:
2180:
2160:
2154:
2153:
2143:
2123:
2117:
2116:
2115:
2114:
2083:
2063:
2057:
2056:
2028:
2022:
2021:
2012:
1996:
1990:
1989:
1979:
1959:
1953:
1952:
1942:
1922:
1916:
1915:
1913:
1903:
1883:
1877:
1876:
1848:
1828:
1822:
1821:
1811:
1801:
1781:
1775:
1774:
1764:
1754:
1734:
1728:
1727:
1717:
1697:
1691:
1690:
1681:
1665:
1659:
1658:
1648:
1642:
1641:
1631:
1622:
1621:
1615:
1607:
1597:
1581:
1575:
1568:
1562:
1561:
1551:
1545:
1544:
1543:
1542:
1533:, archived from
1522:
1472:Jaccard distance
1368:
1366:
1365:
1360:
1358:
1357:
1341:
1339:
1338:
1333:
1321:
1319:
1318:
1313:
1301:
1299:
1298:
1293:
1291:
1290:
1278:
1277:
1257:
1255:
1254:
1249:
1237:
1235:
1234:
1229:
1227:
1226:
1215:
1196:
1194:
1193:
1188:
1186:
1185:
1169:
1167:
1166:
1161:
1159:
1158:
1138:
1136:
1135:
1130:
1128:
1127:
1111:
1109:
1108:
1103:
1101:
1100:
1080:
1078:
1077:
1072:
1064:
1046:
1044:
1043:
1038:
1026:
1024:
1023:
1018:
1002:
1000:
999:
994:
982:
980:
979:
974:
962:
960:
959:
954:
857:
855:
853:
852:
847:
845:
844:
827:
825:
824:
819:
817:
816:
800:
798:
797:
792:
780:
778:
777:
772:
760:
758:
757:
752:
747:
746:
737:
716:
714:
713:
708:
700:
695:
694:
670:
668:
667:
662:
651:
650:
641:
615:
610:
609:
593:
583:
582:
581:
569:
568:
554:
551:
537:
536:
527:
522:
521:
505:
504:
503:
491:
490:
476:
473:
468:
467:
462:
461:
452:
439:
437:
436:
431:
429:
428:
423:
422:
413:
403:
401:
400:
395:
393:
392:
376:
374:
373:
368:
357:summed over all
356:
354:
353:
348:
343:
342:
333:
313:
308:
307:
285:
283:
282:
277:
275:
274:
258:
256:
255:
250:
248:
247:
231:
229:
228:
223:
218:
217:
208:
203:
202:
151:
147:
143:
140:seriously" and "
139:
135:
131:
124:
120:
116:
112:
99:parallel corpora
74:semantic parsing
21:
2335:
2334:
2330:
2329:
2328:
2326:
2325:
2324:
2305:
2304:
2289:
2284:
2271:
2270:
2266:
2254:
2253:
2249:
2237:
2236:
2227:
2199:
2198:
2194:
2162:
2161:
2157:
2125:
2124:
2120:
2112:
2110:
2100:
2065:
2064:
2060:
2030:
2029:
2025:
1998:
1997:
1993:
1961:
1960:
1956:
1924:
1923:
1919:
1885:
1884:
1880:
1865:
1830:
1829:
1825:
1783:
1782:
1778:
1736:
1735:
1731:
1699:
1698:
1694:
1667:
1666:
1662:
1650:
1649:
1645:
1633:
1632:
1625:
1608:
1583:
1582:
1578:
1569:
1565:
1553:
1552:
1548:
1540:
1538:
1524:
1523:
1516:
1512:
1490:
1435:
1408:Similar to how
1406:
1383:skip gram model
1379:
1349:
1344:
1343:
1324:
1323:
1304:
1303:
1282:
1269:
1264:
1263:
1240:
1239:
1210:
1199:
1198:
1177:
1172:
1171:
1150:
1145:
1144:
1119:
1114:
1113:
1092:
1087:
1086:
1049:
1048:
1029:
1028:
1009:
1008:
1005:word embeddings
985:
984:
965:
964:
945:
944:
930:
925:
892:
863:
836:
831:
830:
829:
808:
803:
802:
783:
782:
763:
762:
738:
719:
718:
686:
675:
674:
642:
601:
573:
560:
528:
513:
495:
482:
453:
445:
444:
414:
406:
405:
384:
379:
378:
359:
358:
334:
299:
288:
287:
266:
261:
260:
239:
234:
233:
209:
194:
183:
182:
167:
149:
145:
141:
137:
133:
129:
122:
118:
114:
110:
95:
90:
35:
28:
23:
22:
15:
12:
11:
5:
2333:
2331:
2323:
2322:
2317:
2307:
2306:
2303:
2302:
2296:
2288:
2287:External links
2285:
2283:
2282:
2264:
2247:
2225:
2192:
2155:
2118:
2098:
2058:
2023:
1991:
1954:
1917:
1878:
1863:
1823:
1776:
1729:
1692:
1660:
1643:
1623:
1576:
1563:
1546:
1513:
1511:
1508:
1507:
1506:
1501:
1496:
1489:
1486:
1434:
1431:
1405:
1402:
1378:
1375:
1356:
1352:
1342:is split into
1331:
1311:
1302:matrix. Since
1289:
1285:
1281:
1276:
1272:
1247:
1225:
1222:
1219:
1214:
1209:
1206:
1184:
1180:
1157:
1153:
1126:
1122:
1099:
1095:
1070:
1067:
1063:
1059:
1056:
1036:
1016:
992:
972:
952:
938:neural network
929:
926:
924:
921:
912:autoregressive
900:neural network
891:
888:
862:
859:
843:
839:
815:
811:
790:
770:
750:
745:
741:
736:
732:
729:
726:
706:
703:
699:
693:
689:
685:
682:
672:
671:
660:
657:
654:
649:
645:
640:
636:
633:
630:
627:
624:
621:
618:
614:
608:
604:
600:
597:
592:
588:
580:
576:
572:
567:
563:
558:
549:
546:
543:
540:
535:
531:
526:
520:
516:
512:
509:
502:
498:
494:
489:
485:
480:
471:
465:
460:
456:
426:
421:
417:
391:
387:
366:
346:
341:
337:
332:
328:
325:
322:
319:
316:
312:
306:
302:
298:
295:
273:
269:
246:
242:
221:
216:
212:
207:
201:
197:
193:
190:
175:pivot language
166:
163:
154:
153:
126:
94:
91:
89:
86:
26:
24:
14:
13:
10:
9:
6:
4:
3:
2:
2332:
2321:
2318:
2316:
2313:
2312:
2310:
2300:
2297:
2294:
2291:
2290:
2286:
2278:
2274:
2268:
2265:
2260:
2259:
2251:
2248:
2243:
2242:
2234:
2232:
2230:
2226:
2221:
2217:
2212:
2207:
2203:
2196:
2193:
2188:
2184:
2179:
2174:
2170:
2166:
2159:
2156:
2151:
2147:
2142:
2137:
2133:
2129:
2122:
2119:
2109:
2105:
2101:
2095:
2091:
2087:
2082:
2077:
2073:
2069:
2062:
2059:
2054:
2050:
2046:
2042:
2038:
2034:
2027:
2024:
2020:
2016:
2011:
2006:
2002:
1995:
1992:
1987:
1983:
1978:
1973:
1969:
1965:
1958:
1955:
1950:
1946:
1941:
1936:
1932:
1928:
1921:
1918:
1912:
1907:
1902:
1897:
1893:
1889:
1882:
1879:
1874:
1870:
1866:
1860:
1856:
1852:
1847:
1842:
1838:
1834:
1827:
1824:
1819:
1815:
1810:
1805:
1800:
1795:
1791:
1787:
1780:
1777:
1772:
1768:
1763:
1758:
1753:
1748:
1744:
1740:
1733:
1730:
1725:
1721:
1716:
1711:
1707:
1703:
1696:
1693:
1689:
1685:
1680:
1675:
1671:
1664:
1661:
1656:
1655:
1647:
1644:
1639:
1638:
1630:
1628:
1624:
1619:
1613:
1605:
1601:
1596:
1591:
1587:
1580:
1577:
1573:
1567:
1564:
1559:
1558:
1550:
1547:
1537:on 2018-01-06
1536:
1532:
1528:
1521:
1519:
1515:
1509:
1505:
1502:
1500:
1497:
1495:
1492:
1491:
1487:
1485:
1481:
1478:
1473:
1467:
1465:
1461:
1456:
1453:
1448:
1444:
1440:
1432:
1430:
1428:
1427:meta-learning
1424:
1420:
1415:
1411:
1403:
1401:
1399:
1394:
1392:
1388:
1384:
1376:
1374:
1372:
1354:
1350:
1329:
1309:
1287:
1283:
1279:
1274:
1270:
1261:
1260:pooling layer
1245:
1223:
1220:
1217:
1207:
1204:
1182:
1178:
1155:
1151:
1142:
1124:
1120:
1097:
1093:
1083:
1065:
1061:
1057:
1034:
1014:
1006:
1003:-dimensional
990:
970:
950:
941:
939:
935:
927:
922:
920:
917:
913:
909:
905:
901:
897:
889:
887:
885:
882:using simple
881:
876:
872:
868:
860:
858:
841:
837:
813:
809:
788:
768:
743:
739:
730:
701:
691:
687:
655:
652:
647:
643:
634:
622:
619:
616:
606:
602:
590:
586:
578:
574:
570:
565:
561:
547:
541:
538:
533:
529:
518:
514:
500:
496:
492:
487:
483:
469:
458:
454:
443:
442:
441:
419:
415:
389:
385:
364:
339:
335:
326:
314:
304:
300:
271:
267:
244:
240:
214:
210:
199:
195:
179:
176:
172:
164:
162:
159:
148:, among them
127:
125:are variables
108:
107:
106:
104:
100:
92:
87:
85:
83:
79:
75:
72:, as well as
71:
67:
63:
59:
55:
51:
47:
43:
39:
33:
19:
2276:
2267:
2257:
2250:
2240:
2201:
2195:
2168:
2158:
2131:
2121:
2111:, retrieved
2071:
2061:
2036:
2026:
2000:
1994:
1967:
1957:
1930:
1920:
1891:
1881:
1836:
1826:
1789:
1779:
1742:
1732:
1705:
1695:
1669:
1663:
1653:
1646:
1636:
1585:
1579:
1566:
1556:
1549:
1539:, retrieved
1535:the original
1530:
1482:
1468:
1457:
1436:
1407:
1404:Transformers
1395:
1389:(RNN) or an
1380:
1084:
942:
934:autoencoders
931:
908:autoencoding
893:
890:Transformers
864:
673:
180:
168:
155:
96:
42:paraphrasing
41:
37:
36:
1047:to produce
54:paraphrases
2309:Categories
2113:2022-10-06
2081:2103.11909
2010:1506.06726
1901:2203.10940
1846:2103.12450
1799:1909.03588
1752:2107.01294
1679:1610.03098
1595:2210.03568
1541:2017-12-29
1510:References
1433:Evaluation
880:perplexity
78:generation
38:Paraphrase
2187:236460333
2150:235436269
2108:232307572
1986:237497412
1949:250390695
1873:232320374
1818:202537332
1771:247315430
1724:243865349
1612:cite book
1447:ROC curve
1280:×
1221:×
1208:∈
1069:⌋
1055:⌊
587:∑
571:≠
493:≠
464:^
425:^
2053:52967399
1488:See also
1445:, or an
1443:f1 score
1439:accuracy
871:residual
136:people,
117:people,
2015:Bibcode
1684:Bibcode
1477:N-grams
1371:softmax
875:one-hot
123:X, Y, Z
82:corpora
48:is the
2220:837398
2218:
2185:
2148:
2106:
2096:
2051:
1984:
1947:
1871:
1861:
1816:
1769:
1722:
158:n-gram
64:, and
2216:S2CID
2183:S2CID
2146:S2CID
2104:S2CID
2076:arXiv
2049:S2CID
2005:arXiv
1982:S2CID
1945:S2CID
1896:arXiv
1869:S2CID
1841:arXiv
1814:S2CID
1794:arXiv
1767:S2CID
1747:arXiv
1720:S2CID
1674:arXiv
1590:arXiv
963:with
914:, or
801:when
2094:ISBN
1859:ISBN
1618:link
1464:BLEU
1425:and
1391:LSTM
1170:and
1112:and
717:and
76:and
2206:doi
2173:doi
2136:doi
2086:doi
2041:doi
1972:doi
1935:doi
1906:doi
1851:doi
1804:doi
1757:doi
1710:doi
1600:doi
1429:.
557:max
552:arg
479:max
474:arg
44:in
40:or
2311::
2275:.
2228:^
2214:.
2181:.
2167:.
2144:.
2130:.
2102:,
2092:,
2084:,
2070:,
2047:.
2035:.
2013:,
2003:,
1980:.
1966:.
1943:.
1929:.
1904:.
1890:.
1867:.
1857:.
1849:.
1835:.
1812:.
1802:.
1788:.
1765:.
1755:.
1741:.
1718:.
1704:.
1682:,
1672:,
1626:^
1614:}}
1610:{{
1598:.
1529:,
1517:^
1441:,
1393:.
1238:.
910:,
725:Pr
681:Pr
629:Pr
596:Pr
508:Pr
321:Pr
294:Pr
189:Pr
84:.
60:,
2279:.
2222:.
2208::
2189:.
2175::
2152:.
2138::
2088::
2078::
2055:.
2043::
2017::
2007::
1988:.
1974::
1951:.
1937::
1914:.
1908::
1898::
1875:.
1853::
1843::
1820:.
1806::
1796::
1773:.
1759::
1749::
1726:.
1712::
1686::
1676::
1620:)
1606:.
1602::
1592::
1355:p
1351:n
1330:S
1310:S
1288:p
1284:n
1275:p
1271:n
1246:S
1224:5
1218:7
1213:R
1205:S
1183:2
1179:W
1156:1
1152:W
1125:2
1121:W
1098:1
1094:W
1066:2
1062:/
1058:m
1035:S
1015:n
991:n
971:m
951:W
856:.
842:2
838:e
814:1
810:e
789:S
769:S
749:)
744:1
740:e
735:|
731:f
728:(
705:)
702:f
698:|
692:2
688:e
684:(
659:)
656:S
653:,
648:1
644:e
639:|
635:f
632:(
626:)
623:S
620:,
617:f
613:|
607:2
603:e
599:(
591:f
579:1
575:e
566:2
562:e
548:=
545:)
542:S
539:,
534:1
530:e
525:|
519:2
515:e
511:(
501:1
497:e
488:2
484:e
470:=
459:2
455:e
420:2
416:e
390:1
386:e
365:f
345:)
340:1
336:e
331:|
327:f
324:(
318:)
315:f
311:|
305:2
301:e
297:(
272:1
268:e
245:2
241:e
220:)
215:1
211:e
206:|
200:2
196:e
192:(
150:Z
146:X
142:Y
138:Z
134:Y
130:X
119:Z
115:Y
111:X
34:.
20:)
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.