Automatic summarization

639:

have many similar sentences. To address this issue, LexRank applies a heuristic post-processing step that adds sentences in rank order, but discards sentences that are too similar to ones already in the summary. This method is called Cross-Sentence Information Subsumption (CSIS). These methods work based on the idea that sentences "recommend" other similar sentences to the reader. Thus, if one sentence is very similar to many others, it will likely be a sentence of great importance. Its importance also stems from the importance of the sentences "recommending" it. Thus, to get ranked highly and placed in a summary, a sentence must be similar to many sentences that are in turn also similar to many other sentences. This makes intuitive sense and allows the algorithms to be applied to an arbitrary new text. The methods are domain-independent and easily portable. One could imagine the features indicating important sentences in the news domain might vary considerably from the biomedical domain. However, the unsupervised "recommendation"-based approach applies to any domain.

491:

unigram "learning" might co-occur with "machine", "supervised", "un-supervised", and "semi-supervised" in four different sentences. Thus, the "learning" vertex would be a central "hub" that connects to these other modifying words. Running PageRank/TextRank on the graph is likely to rank "learning" highly. Similarly, if the text contains the phrase "supervised classification", then there would be an edge between "supervised" and "classification". If "classification" appears several other places and thus has many neighbors, its importance would contribute to the importance of "supervised". If it ends up with a high rank, it will be selected as one of the top T unigrams, along with "learning" and probably "classification". In the final post-processing step, we would then end up with keyphrases "supervised learning" and "supervised classification".

513:

position in the document (i.e., the first few sentences are probably important), the number of words in the sentence, etc. The main difficulty in supervised extractive summarization is that the known summaries must be manually created by extracting sentences so the sentences in an original training document can be labeled as "in summary" or "not in summary". This is not typically how people create summaries, so simply using journal abstracts or existing summaries is usually not sufficient. The sentences in these summaries do not necessarily match up with sentences in the original text, so it would be difficult to assign labels to examples for training. Note, however, that these natural summaries can still be used for evaluation purposes, since ROUGE-1 evaluation only considers unigrams.

467:

second step that merges highly ranked adjacent unigrams to form multi-word phrases. This has a nice side effect of allowing us to produce keyphrases of arbitrary length. For example, if we rank unigrams and find that "advanced", "natural", "language", and "processing" all get high ranks, then we would look at the original text and see that these words appear consecutively and create a final keyphrase using all four together. Note that the unigrams placed in the graph can be filtered by part of speech. The authors found that adjectives and nouns were the best to include. Thus, some linguistic knowledge comes into play in this step.

424:. Many documents with known keyphrases are needed. Furthermore, training on a specific domain tends to customize the extraction process to that domain, so the resulting classifier is not necessarily portable, as some of Turney's results demonstrate. Unsupervised keyphrase extraction removes the need for training data. It approaches the problem from a different angle. Instead of trying to learn explicit features that characterize keyphrases, the TextRank algorithm exploits the structure of the text itself to determine keyphrases that appear "central" to the text in the same way that 391:

removing stopwords. Hulth showed that you can get some improvement by selecting examples to be sequences of tokens that match certain patterns of part-of-speech tags. Ideally, the mechanism for generating examples produces all the known labeled keyphrases as candidates, though this is often not the case. For example, if we use only unigrams, bigrams, and trigrams, then we will never be able to extract a known keyphrase containing four words. Thus, recall may suffer. However, generating too many examples can also lead to low precision.

362:, and trigram found in the text (though other text units are also possible, as discussed below). We then compute various features describing each example (e.g., does the phrase begin with an upper-case letter?). We assume there are known keyphrases available for a set of training documents. Using the known keyphrases, we can assign positive or negative labels to the examples. Then we learn a classifier that can discriminate between positive and negative examples as a function of the features. Some classifiers make a 218:

representative images or video segments, as stated above. For text, extraction is analogous to the process of skimming, where the summary (if available), headings and subheadings, figures, the first and last paragraphs of a section, and optionally the first and last sentences in a paragraph are read before one chooses to read the entire document in detail. Other examples of extraction that include key sequences of text in terms of clinical relevance (including patient/problem, intervention, and outcome).

734: 630:

and cut the time by pointing to the most relevant source documents, comprehensive multi-document summary should itself contain the required information, hence limiting the need for accessing original files to cases when refinement is required. Automatic summaries present information extracted from multiple sources algorithmically, without any editorial touch or subjective human intervention, thus making it completely unbiased.

36: 367:

We apply the same example-generation strategy to the test documents, then run each example through the learner. We can determine the keyphrases by looking at binary classification decisions or probabilities returned from our learned model. If probabilities are given, a threshold is used to select the keyphrases. Keyphrase extractors are generally evaluated using

897:(NMF). Although they did not replace other approaches and are often combined with them, by 2019 machine learning methods dominated the extractive summarization of single documents, which was considered to be nearing maturity. By 2020, the field was still very active and research is shifting towards abstractive summation and real-time summarization. 395:

length of the example, relative position of the first occurrence, various Boolean syntactic features (e.g., contains all caps), etc. The Turney paper used about 12 such features. Hulth uses a reduced set of features, which were found most successful in the KEA (Keyphrase Extraction Algorithm) work derived from Turney's seminal paper.

318:"The Army Corps of Engineers, rushing to meet President Bush's promise to protect New Orleans by the start of the 2006 hurricane season, installed defective flood-control pumps last year despite warnings from its own expert that the equipment would fail during a storm, according to documents obtained by The Associated Press". 293:. These algorithms model notions like diversity, coverage, information and representativeness of the summary. Query based summarization techniques, additionally model for relevance of the summary with the query. Some techniques and algorithms which naturally model summarization problems are TextRank and PageRank, 314:, many authors provide manually assigned keywords, but most text lacks pre-existing keyphrases. For example, news articles rarely have keyphrases attached, but it would be useful to be able to automatically do so for a number of applications discussed below. Consider the example text from a news article: 655:

A new method for multi-lingual multi-document summarization that avoids redundancy generates ideograms to represent the meaning of each sentence in each document, then evaluates similarity by comparing ideogram shape and position. It does not use word frequency, training or preprocessing. It uses two

617:

is an automatic procedure aimed at extraction of information from multiple texts written about the same topic. Resulting summary report allows individual users, such as professional information consumers, to quickly familiarize themselves with information contained in a large cluster of documents. In

490:

It is not initially clear why applying PageRank to a co-occurrence graph would produce useful keyphrases. One way to think about it is the following. A word that appears multiple times throughout a text may have many different co-occurring neighbors. For example, in a text about machine learning, the

478:

appear within a window of size N in the original text. N is typically around 2–10. Thus, "natural" and "language" might be linked in a text about NLP. "Natural" and "processing" would also be linked because they would both appear in the same string of N words. These edges build on the notion of "text

466:

The vertices should correspond to what we want to rank. Potentially, we could do something similar to the supervised methods and create a vertex for each unigram, bigram, trigram, etc. However, to keep the graph small, the authors decide to rank individual unigrams in a first step, and then include a

366:

for a test example, while others assign a probability of being a keyphrase. For instance, in the above text, we might learn a rule that says phrases with initial capital letters are likely to be keyphrases. After training a learner, we can select keyphrases for test documents in the following manner.

322:

A keyphrase extractor might select "Army Corps of Engineers", "President Bush", "New Orleans", and "defective flood-control pumps" as keyphrases. These are pulled directly from the text. In contrast, an abstractive keyphrase system would somehow internalize the content and generate keyphrases that do

809:

Intrinsic evaluation assesses the summaries directly, while extrinsic evaluation evaluates how the summarization system affects the completion of some other task. Intrinsic evaluations have assessed mainly the coherence and informativeness of summaries. Extrinsic evaluations, on the other hand, have

629:

Multi-document summarization creates information reports that are both concise and comprehensive. With different opinions being put together and outlined, every topic is described from multiple perspectives within a single document. While the goal of a brief summary is to simplify information search

390:

Designing a supervised keyphrase extraction system involves deciding on several choices (some of these apply to unsupervised, too). The first choice is exactly how to generate examples. Turney and others have used all possible unigrams, bigrams, and trigrams without intervening punctuation and after

486:

Since this method simply ranks the individual vertices, we need a way to threshold or produce a limited number of keyphrases. The technique chosen is to set a count T to be a user-specified fraction of the total number of vertices in the graph. Then the top T vertices/unigrams are selected based on

244:

Approaches aimed at higher summarization quality rely on combined software and human effort. In Machine Aided Human Summarization, extractive techniques highlight candidate passages for inclusion (to which the human adds or removes text). In Human Aided Machine Summarization, a human post-processes

638:

Multi-document extractive summarization faces a problem of redundancy. Ideally, we want to extract sentences that are both "central" (i.e., contain the main ideas) and "diverse" (i.e., they differ from one another). For example, in a set of news articles about some event, each article is likely to

558:

A more principled way to estimate sentence importance is using random walks and eigenvector centrality. LexRank is an algorithm essentially identical to TextRank, and both use this approach for document summarization. The two methods were developed by different groups at the same time, and LexRank

164:

summarization is the subject of ongoing research; existing approaches typically attempt to display the most representative images from a given image collection, or generate a video that only includes the most important content from the entire collection. Video summarization algorithms identify and

716:

Submodular functions have achieved state-of-the-art for almost all summarization problems. For example, work by Lin and Bilmes, 2012 shows that submodular functions achieve the best results to date on DUC-04, DUC-05, DUC-06 and DUC-07 systems for document summarization. Similarly, work by Lin and

394:

We also need to create features that describe the examples and are informative enough to allow a learning algorithm to discriminate keyphrases from non- keyphrases. Typically features involve various term frequencies (how many times a phrase appears in the current text or in a larger corpus), the

285:

system. Video summarization is a related domain, where the system automatically creates a trailer of a long video. This also has applications in consumer or personal videos, where one might want to skip the boring or repetitive actions. Similarly, in surveillance videos, one would want to extract

720:

Submodular Functions have also been used for other summarization tasks. Tschiatschek et al., 2014 show that mixtures of submodular functions achieve state-of-the-art results for image collection summarization. Similarly, Bairi et al., 2015 show the utility of submodular functions for summarizing

651:

The state of the art results for multi-document summarization are obtained using mixtures of submodular functions. These methods have achieved the state of the art results for Document Summarization Corpora, DUC 04 - 07. Similar results were achieved with the use of determinantal point processes

494:

In short, the co-occurrence graph will contain densely connected regions for terms that appear often and in different contexts. A random walk on this graph will have a stationary distribution that assigns large probabilities to the terms in the centers of the clusters. This is similar to densely

450:

between the text unit vertices. Unlike PageRank, the edges are typically undirected and can be weighted to reflect a degree of similarity. Once the graph is constructed, it is used to form a stochastic matrix, combined with a damping factor (as in the "random surfer model"), and the ranking over

844:

overlaps between automatically generated summaries and previously written human summaries. It is recall-based to encourage inclusion of all important topics in summaries. Recall can be computed with respect to unigram, bigram, trigram, or 4-gram matching. For example, ROUGE-1 is the fraction of

226:

Abstractive summarization methods generate new text that did not exist in the original text. This has been applied mainly for text. Abstractive methods build an internal semantic representation of the original content (often called a language model), and then use this representation to create a

863:

Domain-independent summarization techniques apply sets of general features to identify information-rich text segments. Recent research focuses on domain-specific summarization using knowledge specific to the text's domain, such as medical knowledge and ontologies for summarizing medical texts.

705:

to model diversity. Similarly, the Maximum-Marginal-Relevance procedure can also be seen as an instance of submodular optimization. All these important models encouraging coverage, diversity and information are all submodular. Moreover, submodular functions can be efficiently combined, and the

398:

In the end, the system will need to return a list of keyphrases for a test document, so we need to have a way to limit the number. Ensemble methods (i.e., using votes from several classifiers) have been used to produce numeric scores that can be thresholded to provide a user-provided number of

1411:

Jorge E. Camargo and Fabio A. González. A Multi-class Kernel Alignment Method for Image Collection Summarization. In Proceedings of the 14th Iberoamerican Conference on Pattern Recognition: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications (CIARP '09), Eduardo

821:

Human judgement often varies greatly in what it considers a "good" summary, so creating an automatic evaluation process is particularly difficult. Manual evaluation can be used, but this is both time and labor-intensive, as it requires humans to read not only the summaries but also the source

512:

Supervised text summarization is very much like supervised keyphrase extraction. Basically, if you have a collection of documents and human-generated summaries for them, you can learn features of sentences that make them good candidates for inclusion in the summary. Features might include the

845:

unigrams that appear in both the reference summary and the automatic summary out of all unigrams in the reference summary. If there are multiple reference summaries, their scores are averaged. A high level of overlap should indicate a high degree of shared concepts between the two summaries.

268:

An example of a summarization problem is document summarization, which attempts to automatically produce an abstract from a given document. Sometimes one might be interested in generating a summary from a single source document, while others can use multiple source documents (for example, a

872:

The main drawback of the evaluation systems so far is that we need a reference summary (for some methods, more than one), to compare automatic summaries with models. This is a hard and expensive task. Much effort has to be made to create corpora of texts and their corresponding summaries.

410:

is used to learn parameters for a domain-specific keyphrase extraction algorithm. The extractor follows a series of heuristics to identify keyphrases. The genetic algorithm optimizes parameters for these heuristics with respect to performance on training documents with known key phrases.

217:

Here, content is extracted from the original data, but the extracted content is not modified in any way. Examples of extracted content include key-phrases that can be used to "tag" or index a text document, or key sentences (including headings) that collectively comprise an abstract, and

487:

their stationary probabilities. A post- processing step is then applied to merge adjacent instances of these T unigrams. As a result, potentially more or less than T final keyphrases will be produced, but the number should be roughly proportional to the length of the original text.

647:

random walks (a random walk where certain states end the walk). The algorithm is called GRASSHOPPER. In addition to explicitly promoting diversity during the ranking process, GRASSHOPPER incorporates a prior ranking (based on sentence position in the case of summarization).

432:. In this way, TextRank does not rely on any previous training data at all, but rather can be run on any arbitrary piece of text, and it can produce output simply based on the text's intrinsic properties. Thus the algorithm is easily portable to new domains and languages. 235:

and often a deep understanding of the domain of the original text in cases where the original document relates to a special field of knowledge. "Paraphrasing" is even more difficult to apply to images and videos, which is why most summarization systems are extractive.

280:

Image collection summarization is another application example of automatic summarization. It consists in selecting a representative set of images from a larger set of images. A summary in this context is useful to show the most representative images of results in an

706:

resulting function is still submodular. Hence, one could combine one submodular function which models diversity, another one which models coverage and use human supervision to learn a right model of a submodular function for the problem.

371:. Precision measures how many of the proposed keyphrases are actually correct. Recall measures how many of the true keyphrases your system proposed. The two measures can be combined in an F-score, which is the harmonic mean of the two ( 577:

by the sentences' lengths). The LexRank paper explored using unweighted edges after applying a threshold to the cosine values, but also experimented with using edges with weights equal to the similarity score. TextRank uses continuous

775:"autotldr", created in 2011 summarizes news articles in the comment-section of reddit posts. It was found to be very useful by the reddit community which upvoted its summaries hundreds of thousands of times. The name is reference to 419:

Another keyphrase extraction algorithm is TextRank. While supervised methods have some nice properties, like being able to produce interpretable rules for what features characterize a keyphrase, they also require a large amount of

550:

The unsupervised approach to summarization is also quite similar in spirit to unsupervised keyphrase extraction and gets around the issue of costly training data. Some unsupervised summarization approaches are based on finding a

642:

A related method is Maximal Marginal Relevance (MMR), which uses a general-purpose graph-based ranking algorithm like Page/Lex/TextRank that handles both "centrality" and "diversity" in a unified mathematical framework based on

717:

Bilmes, 2011, shows that many existing systems for automatic summarization are instances of submodular functions. This was a breakthrough result establishing submodular functions as the right models for summarization problems.

443:. Essentially, it runs PageRank on a graph specially designed for a particular NLP task. For keyphrase extraction, it builds a graph using some set of text units as vertices. Edges are based on some measure of semantic or 585:

In both algorithms, the sentences are ranked by applying PageRank to the resulting graph. A summary is formed by combining the top ranking sentences, using a threshold or length cutoff to limit the size of the summary.

2058:

Zhang, J., Zhao, Y., Saleh, M., & Liu, P. (2020, November). Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In International Conference on Machine Learning (pp. 11328-11339).

503:

Like keyphrase extraction, document summarization aims to identify the essence of a text. The only real difference is that now we are dealing with larger text units—whole sentences instead of words and phrases.

533:(ME) classifier for the meeting summarization task, as ME is known to be robust against feature dependencies. Maximum entropy has also been applied successfully for summarization in the broadcast news domain. 277:. A related application is summarizing news articles. Imagine a system, which automatically pulls together news articles on a given topic (from the web), and concisely represents the latest news as a summary. 541:

A promising approach is adaptive document/text summarization. It involves first recognizing the text genre and then applying summarization algorithms optimized for this genre. Such software has been created.

696:

a given set of concepts. For example, in document summarization, one would like the summary to cover all important and relevant concepts in the document. This is an instance of set cover. Similarly, the

309:

The task is the following. You are given a piece of text, such as a journal article, and you must produce a list of keywords or keys that capture the primary topics discussed in the text. In the case of

873:

Furthermore, some methods require manual annotation of the summaries (e.g. SCU in the Pyramid Method). Moreover, they all perform a quantitative evaluation with regard to different similarity metrics.

597:

with either user-specified or automatically tuned weights. In this case, some training documents might be needed, though the TextRank results show the additional features are not absolutely necessary.

1983:

Luhn, Hans Peter (1957). "A Statistical Approach to Mechanized Encoding and Searching of Literary Information" (PDF). IBM Journal of Research and Development. 1 (4): 309–317. doi:10.1147/rd.14.0309.

265:, which summarizes objects specific to a query. Summarization systems are able to create both query relevant text summaries and generic machine-generated summaries depending on what the user needs. 917:) have provided a flexibility in the mapping of text sequences to text sequences of a different type, which is well suited to automatic summarization. This includes models such as T5 and Pegasus. 1199: 289:

At a very high level, summarization algorithms try to find subsets of objects (like set of sentences, or a set of images), which cover information of the entire set. This is also called the

1685:

Nemhauser, George L., Laurence A. Wolsey, and Marshall L. Fisher. "An analysis of approximations for maximizing submodular set functions—I." Mathematical Programming 14.1 (1978): 265-294.

982:

Pan, Xingjia; Tang, Fan; Dong, Weiming; Ma, Chongyang; Meng, Yiping; Huang, Feiyue; Lee, Tong-Yee; Xu, Changsheng (2021-04-01). "Content-Based Visual Summarization for Image Collection".

818:

Intra-textual evaluation assess the output of a specific summarization system, while inter-textual evaluation focuses on contrastive analysis of outputs of several summarization systems.

231:

sections of the source document, to condense a text more strongly than extraction. Such transformation, however, is computationally much more challenging than extraction, involving both

713:

admits a constant factor guarantee. Moreover, the greedy algorithm is extremely simple to implement and can scale to large datasets, which is very important for summarization problems.

522: 399:

keyphrases. This is the technique used by Turney with C4.5 decision trees. Hulth used a single binary classifier so the learning algorithm implicitly determines the appropriate number.

701:

is a special case of submodular functions. The Facility Location function also naturally models coverage and diversity. Another example of a submodular optimization problem is using a

173:), normally in a temporally ordered fashion. Video summaries simply retain a carefully selected subset of the original video frames and, therefore, are not identical to the output of 2401: 323:

not appear in the text, but more closely resemble what a human might produce, such as "political negligence" or "inadequate protection from floods". Abstraction requires a deep

2561: 1548: 555:" sentence, which is the mean word vector of all the sentences in the document. Then the sentences can be ranked with regard to their similarity to this centroid sentence. 889:

had been used by 2016. Pattern-based summarization was the most powerful option for multi-document summarization found by 2016. In the following year it was surpassed by

402:

Once examples and features are created, we need a way to learn to predict keyphrases. Virtually any supervised learning algorithm could be used, such as decision trees,

2371: 2329: 2182: 1993:

Widyassari, Adhika Pramita; Rustad, Supriadi; Shidik, Guruh Fajar; Noersasongko, Edi; Syukur, Abdul; Affandy, Affandy; Setiadi, De Rosal Ignatius Moses (2020-05-20).

1398: 257:, which focuses on obtaining a generic summary or abstract of the collection (whether documents, or sets of images, or videos, news stories etc.). The second is 886: 3160: 2539: 2155: 855:. Similarly, for image summarization, Tschiatschek et al., developed a Visual-ROUGE score which judges the performance of algorithms for image summarization. 327:, which makes it difficult for a computer system. Keyphrases have many applications. They can enable document browsing by providing a short summary, improve 692:

is a special case of submodular optimization, since the set cover function is submodular. The set cover function attempts to find a subset of objects which

529:

and statistical language models for modeling salience. Although the system exhibited good results, the researchers wanted to explore the effectiveness of a

848:

ROUGE cannot determine if the result is coherent, that is if sentences flow together in a sensibly. High-order n-gram ROUGE measures help to some degree.

589:

It is worth noting that TextRank was applied to summarization exactly as described here, while LexRank was used as part of a larger summarization system (

2298: 1544: 2950: 2394: 709:

While submodular functions are fitting problems for summarization, they also admit very efficient algorithms for optimization. For example, a simple

1938: 1721: 53: 3119: 2116: 1635: 387:) ). Matches between the proposed keyphrases and the known keyphrases can be checked after stemming or applying some other text normalization. 656:

user-supplied parameters: equivalence (when are two sentences to be considered equivalent?) and relevance (how long is the desired summary?).

2355: 2275: 2256: 2237: 2216: 1968: 1446: 1374: 1167: 1132: 966: 2338:

Miranda-Jiménez, Sabino, Gelbukh, Alexander, and Sidorov, Grigori (2013). "Summarizing Conceptual Graphs for Automatic Summarization Task".

1148:

Elhamifar, Ehsan; Sapiro, Guillermo; Vidal, Rene (2012). "See all by looking at a few: Sparse modeling for finding representative objects".

525:

developed a sentence extraction system for multi-document summarization in the news domain. The system was based on a hybrid system using a

3155: 2860: 2551: 2387: 906: 156:

methods, designed to locate the most informative sentences in a given document. On the other hand, visual content can be summarized using

840:

ROUGE is a recall-based measure of how well a summary covers the content of human-generated summaries known as references. It calculates

3114: 483:" and the idea that words that appear near each other are likely related in a meaningful way and "recommend" each other to the reader. 3150: 2721: 100: 1638:." Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 1998. 1520: 668:

has recently emerged as a powerful modeling tool for various summarization problems. Submodular functions naturally model notions of

495:

connected Web pages getting ranked highly by PageRank. This approach has also been used in document summarization, considered below.

2875: 2706: 1806: 1122: 894: 119: 72: 1951:

Sarker, Abeed; Molla, Diego; Paris, Cecile (2013). "An Approach for Query-Focused Text Summarisation for Evidence Based Medicine".

721:

multi-document topic hierarchies. Submodular Functions have also successfully been used for summarizing machine learning datasets.

253:

There are broadly two types of extractive summarization tasks depending on what the summarization program focuses on. The first is

2646: 1854: 1429:

Alrehamy, Hassan H; Walker, Coral (2018). "SemCluster: Unsupervised Automatic Keyphrase Extraction Using Affinity Propagation".

789:

may make use of summaries, if the detail lost is not major and the summary is sufficiently stylistically different to the input.

3063: 2716: 798:

The most common way to evaluate the informativeness of automatic summaries is to compare them with human-made model summaries.

79: 1750: 2711: 2456: 324: 57: 1622: 2980: 2701: 559:

simply focused on summarization, but could just as easily be used for keyphrase extraction or any other NLP ranking task.

593:) that combines the LexRank score (stationary probability) with other features like sentence position and length using a 86: 2673: 2305:, Published in Proceeding RIAO'10 Adaptivity, Personalization and Fusion of Heterogeneous Information, CID Paris, France 1711:", The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT), 2011 936: 609: 282: 274: 2144: 3018: 3003: 2975: 2840: 2835: 2410: 1610: 1562:

Yatsko, V. A.; Starikov, M. S.; Butakov, A. V. (2010). "Automatic genre recognition and adaptive text summarization".

702: 440: 351: 298: 232: 153: 2161: 1737:, To Appear In the Annual Meeting of the Association for Computational Linguistics (ACL), Beijing, China, July - 2015 833:(Recall-Oriented Understudy for Gisting Evaluation). It is very common for summarization and translation systems in 68: 2755: 2726: 2504: 1903:

Yatsko, V. A.; Vishnyakov, T. N. (2007). "A method for evaluating modern systems of automatic text summarization".

685: 530: 436: 1334: 2598: 2451: 698: 565:

The edges between sentences are based on some form of semantic similarity or content overlap. While LexRank uses

2300:

Essential summarizer: innovative automatic text summarization software in twenty languages - ACM Digital Library

1351:

Clinical Context-Aware Biomedical Text Summarization Using Deep Neural Network: Model Development and Validation

1309: 3124: 3048: 2780: 2736: 2621: 2519: 2199:

Alrehamy, Hassan (2018). "SemCluster: Unsupervised Automatic Keyphrase Extraction Using Affinity Propagation".

890: 331:(if documents have keyphrases assigned, a user could search by keyphrase to produce more reliable hits than a 562:

In both LexRank and TextRank, a graph is constructed by creating a vertex for each sentence in the document.

3028: 2998: 2034: 823: 665: 456: 294: 140: 46: 1891: 956: 2885: 2578: 2556: 2546: 2514: 2489: 1366:

Text data management and analysis : a practical introduction to information retrieval and text mining

1272: 852: 786: 644: 574: 526: 480: 228: 202: 838: 573:

vectors, TextRank uses a very similar measure based on the number of words two sentences have in common (

245:

software output, in the same way that one edits the output of automatic translation by Google Translate.

227:

summary that is closer to what a human might express. Abstraction may transform the extracted content by

2745: 1041: 363: 328: 2125: 1941:, In Advances of Neural Information Processing Systems (NIPS), Montreal, Canada, December - 2014. (PDF) 428:

selects important Web pages. Recall this is based on the notion of "prestige" or "recommendation" from

3165: 3098: 2774: 2750: 2603: 1481: 1234: 1214: 623: 368: 93: 1757:, To Appear In Proc. International Conference on Machine Learning (ICML), Lille, France, June - 2015 3078: 3008: 2965: 2921: 2693: 2683: 2678: 2566: 1273:"A salient dictionary learning framework for activity video summarization via key-frame extraction" 926: 810:

tested the impact of summarization on tasks like relevance assessment, reading comprehension, etc.

447: 444: 1034:"WIPO PUBLISHES PATENT OF KT FOR "IMAGE SUMMARIZATION SYSTEM AND METHOD" (SOUTH KOREAN INVENTORS)" 3088: 2960: 2825: 2588: 2571: 2429: 2365: 2323: 2227: 2176: 1920: 1724:, In Advances of Neural Information Processing Systems (NIPS), Montreal, Canada, December - 2014. 1579: 1497: 1471: 1392: 1246: 1173: 1103: 1015: 594: 339: 206: 136: 286:

important and suspicious activity, while ignoring all the boring and redundant frames captured.

2157:

Challenging Issues of Automatic Summarization: Relevance Detection and Quality-based Evaluation

1673: 3093: 2805: 2613: 2524: 2351: 2271: 2252: 2233: 2212: 2016: 1964: 1802: 1796: 1538: 1442: 1380: 1370: 1238: 1163: 1128: 1095: 1087: 1007: 999: 962: 689: 566: 407: 350:

Beginning with the work of Turney, many researchers have approached keyphrase extraction as a

146:

are commonly developed and employed to achieve this, specialized for different types of data.

1596: 1412:

Bayro-Corrochano and Jan-Olof Eklundh (Eds.). Springer-Verlag, Berlin, Heidelberg, 545-552.

2970: 2855: 2830: 2631: 2534: 2343: 2204: 2006: 1956: 1912: 1571: 1489: 1434: 1413: 1284: 1230: 1222: 1155: 1079: 991: 710: 619: 579: 332: 311: 270: 3082: 3043: 3038: 2906: 2636: 2509: 2484: 2466: 1754: 1636:

The use of MMR, diversity-based reranking for reordering documents and producing summaries

1625:", International Journal of Intelligent Information Database Systems, 5(2), 119-142, 2011. 1070:; Lexing Xie (February 2012). "ImageHive: Interactive Content-Aware Image Summarization". 882: 590: 157: 139:) that represents the most important or relevant information within the original content. 2310: 338:

Depending on the different literature and the definition of key terms, words or phrases,

1485: 1218: 2790: 2770: 2494: 1747: 910: 830: 780: 429: 174: 17: 733: 3144: 3053: 2865: 2845: 2626: 2094: 1200:"Multimodal stereoscopic movie summarization conforming to narrative characteristics" 1019: 471: 2286: 1250: 885:), starting with a statistical technique. Research increased significantly in 2015. 3033: 2293:, Conceptual artwork using automatic summarization software in Microsoft Word 2008. 2105: 1924: 1792: 1583: 1501: 1177: 1107: 772: 421: 2203:. Advances in Intelligent Systems and Computing. Vol. 650. pp. 222–235. 1433:. Advances in Intelligent Systems and Computing. Vol. 650. pp. 222–235. 135:

is the process of shortening a set of data computationally, to create a subset (a

2347: 2208: 2011: 1994: 1960: 1696:

Learning mixtures of submodular shells with application to document summarization

1662:

Learning mixtures of submodular shells with application to document summarization

1438: 1417: 1045: 688:

problems occur as special instances of submodular optimization. For example, the

2990: 2870: 2583: 2499: 2476: 2424: 1829: 1198:

Mademlis, Ioannis; Tefas, Anastasios; Nikolaidis, Nikos; Pitas, Ioannis (2016).

931: 460: 403: 190: 35: 2309:

Xiaojin, Zhu, Andrew Goldberg, Jurgen Van Gael, and David Andrzejewski (2007).

1335:

https://www.dummies.com/education/language-arts/speed-reading/how-to-skim-text/

1149: 2593: 2379: 1916: 1575: 1493: 1288: 1159: 1067: 1033: 995: 801:

Evaluation can be intrinsic or extrinsic, and inter-textual or intra-textual.

474:

in this application of TextRank. Two vertices are connected by an edge if the

452: 149: 2075:

Author Obfuscation: Attacking the State of the Art in Authorship Verification

2020: 1384: 1226: 1091: 1003: 2461: 2073: 2035:"Exploring Transfer Learning with T5: the Text-To-Text Transfer Transformer" 1939:

Learning Mixtures of Submodular Functions for Image Collection Summarization

1855:"What Does TL;DR Mean? AMA? TIL? Glossary Of Reddit Terms And Abbreviations" 1722:

Learning Mixtures of Submodular Functions for Image Collection Summarization

143: 1734: 1708: 1648: 1242: 1099: 1011: 626:. Multi-document summarization may also be done in response to a question. 600:

Unlike TextRank, LexRank has been applied to multi-document summarization.

1609:

LexRank: Graph-based Lexical Centrality as Salience in Text Summarization

1364: 1353:, J Med Internet Res 2020;22(10):e19810, DOI: 10.2196/19810, PMID 33095174 2936: 2916: 2901: 2880: 2850: 2795: 2760: 2641: 552: 425: 1462:

Turney, Peter D (2002). "Learning Algorithms for Keyphrase Extraction".

1083: 335:), and be employed in generating index entries for a large text corpus. 181:

video frames are being synthesized based on the original video content.

3073: 2931: 2911: 2785: 2529: 2444: 2342:. Lecture Notes in Computer Science. Vol. 7735. pp. 245–253. 1955:. Lecture Notes in Computer Science. Vol. 7885. pp. 295–304. 1767: 1350: 618:

such a way, multi-document summarization systems are complementing the

475: 355: 1735:

Summarizing Multi-Document Topic Hierarchies using Submodular Mixtures

1733:

Ramakrishna Bairi, Rishabh Iyer, Ganesh Ramakrishnan and Jeff Bilmes,

2439: 2434: 1798:

Mastering Data Mining with Python – Find patterns hidden in your data

1121:

Sankar K. Pal; Alfredo Petrosino; Lucia Maddalena (25 January 2012).

841: 769: 570: 359: 1999:

Journal of King Saud University - Computer and Information Sciences

1937:

Sebastian Tschiatschek, Rishabh Iyer, Hoachen Wei and Jeff Bilmes,

1720:

Sebastian Tschiatschek, Rishabh Iyer, Hoachen Wei and Jeff Bilmes,

1476: 165:

extract from the original video content the most important frames (

3129: 2765: 1695: 1661: 776: 161: 1995:"Review of automatic text summarization techniques & methods" 451:

vertices is obtained by finding the eigenvector corresponding to

406:, and rule induction. In the case of Turney's GenEx algorithm, a 2651: 914: 834: 2383: 1151:

2012 IEEE Conference on Computer Vision and Pattern Recognition

652:(which are a special case of submodular functions) for DUC-04. 2926: 2249:

The Theory and Practice of Discourse Parsing and Summarization

728: 29: 2146:

Performance Confidence Estimation for Automatic Summarization

1271:

Mademlis, Ioannis; Tefas, Anastasios; Pitas, Ioannis (2018).

201:

There are two general approaches to automatic summarization:

2118:

Automatic Summarization of Meeting Data: A Feasibility Study

1676:. Foundations and Trends in Machine Learning, December 2012. 354:

problem. Given a document, we construct an example for each

2312:

Improving diversity in ranking using absorbing random walks

1649:

Improving Diversity in Ranking using Absorbing Random Walks

1518:, Department of Computer Science University of North Texas 1748:

Submodularity in Data Subset Selection and Active Learning

1709:

A Class of Submodular Functions for Document Summarization

764:

Specific applications of automatic summarization include:

2107:

The Use of Topic Segmentation for Automatic Summarization

1623:

Versatile question answering systems: seeing in synthesis

2376:, Conceptual Structures for STEM Research and Education. 2072:

Potthast, Martin; Hagen, Matthias; Stein, Benno (2016).

984:

IEEE Transactions on Visualization and Computer Graphics

881:

The first publication in the area dates back to 1957 (

859:

Domain-specific versus domain-independent summarization

745: 660:

Submodular functions as generic tools for summarization

273:

of articles on the same topic). This problem is called

622:

performing the next step down the road of coping with

2340:

Conceptual Structures for STEM Research and Education

1905:

Automatic Documentation and Mathematical Linguistics

1564:

Automatic Documentation and Mathematical Linguistics

3107: 3062: 3017: 2989: 2949: 2894: 2816: 2804: 2735: 2692: 2664: 2612: 2475: 2417: 521:During the DUC 2001 and 2002 evaluation workshops, 60:. Unsourced material may be challenged and removed. 1879: 1674:Determinantal point processes for machine learning 2096:Porting and evaluation of automatic summarization 1124:Handbook on Soft Computing for Video Surveillance 1547:) CS1 maint: bot: original URL status unknown ( 1333:Richard Sutz, Peter Weverka. How to skim text. 2201:Advances in Computational Intelligence Systems 2081:. Conference and Labs of the Evaluation Forum. 1892:Mani, I. Summarization evaluation: an overview 1431:Advances in Computational Intelligence Systems 1345: 1343: 2395: 955:Torres-Moreno, Juan-Manuel (1 October 2014). 829:The most common way to evaluate summaries is 822:documents. Other issues are those concerning 193:released an automatic summarization feature. 169:), and/or the most important video segments ( 8: 2370:: CS1 maint: multiple names: authors list ( 2328:: CS1 maint: multiple names: authors list ( 2181:: CS1 maint: multiple names: authors list ( 27:Computer-based method for summarizing a text 2813: 2609: 2402: 2388: 2380: 2154:Elena, Lloret and Manuel, Palomar (2009). 1526:. Archived from the original on 2012-06-17 1397:: CS1 maint: location missing publisher ( 249:Applications and systems for summarization 2010: 1475: 1310:"Auto-generated Summaries in Google Docs" 1235:1983/2bcdd7a5-825f-4ac9-90ec-f2f538bfcb72 887:Term frequency–inverse document frequency 120:Learn how and when to remove this message 1746:Kai Wei, Rishabh Iyer, and Jeff Bilmes, 301:, maximal marginal relevance (MMR) etc. 152:summarization is usually implemented by 1634:Carbonell, Jaime, and Jade Goldstein. " 1072:IEEE Computer Graphics and Applications 947: 2363: 2321: 2174: 1543:: CS1 maint: archived copy as title ( 1536: 1390: 837:'s Document Understanding Conferences. 1349:Afzal M, Alam F, Malik KM, Malik GM, 1207:IEEE Transactions on Image Processing 7: 3161:Tasks of natural language processing 2861:Simple Knowledge Organization System 2226:Endres-Niggemeyer, Brigitte (1998). 1514:Rada Mihalcea and Paul Tarau, 2004: 58:adding citations to reliable sources 1953:Artificial Intelligence in Medicine 1607:Güneş Erkan and Dragomir R. Radev: 1516:TextRank: Bringing Order into Texts 517:Maximum entropy-based summarization 814:Inter-textual versus intra-textual 25: 2876:Thesaurus (information retrieval) 895:non-negative matrix factorization 1880:Potthast, Hagen & Stein 2016 732: 470:Edges are created based on word 34: 1369:. Sean Massung. . p. 321. 415:Unsupervised approach: TextRank 222:Abstractive-based summarization 45:needs additional citations for 2457:Natural language understanding 684:. Moreover, several important 508:Supervised learning approaches 435:TextRank is a general purpose 346:Supervised learning approaches 213:Extraction-based summarization 1: 2981:Optical character recognition 2303:. Riao '10. pp. 216–217. 1672:Alex Kulesza and Ben Taskar, 439:-based ranking algorithm for 2674:Multi-document summarization 2348:10.1007/978-3-642-35786-2_18 2209:10.1007/978-3-319-66939-7_19 2012:10.1016/j.jksuci.2020.05.006 1961:10.1007/978-3-642-38326-7_41 1859:International Business Times 1439:10.1007/978-3-319-66939-7_19 1418:10.1007/978-3-642-10268-4_64 1154:. IEEE. pp. 1600–1607. 958:Automatic Text Summarization 937:Multi-document summarization 851:Another unsolved problem is 783:for "too long; didn't read". 615:Multi-document summarization 610:Multi-document summarization 604:Multi-document summarization 283:image collection exploration 275:multi-document summarization 259:query relevant summarization 3156:Natural language processing 3004:Latent Dirichlet allocation 2976:Natural language generation 2841:Machine-readable dictionary 2836:Linguistic Linked Open Data 2411:Natural language processing 2334:, The GRASSHOPPER algorithm 2297:Lehmam, Abderrafih (2010). 2093:Hercules, Dalianis (2003). 1597:UNIS (Universal Summarizer) 1127:. CRC Press. pp. 81–. 909:replacing more traditional 703:determinantal point process 352:supervised machine learning 342:is a highly related theme. 299:Determinantal point process 233:natural language processing 154:natural language processing 3182: 2756:Explicit semantic analysis 2505:Deep linguistic processing 2104:Roxana, Angheluta (2002). 805:Intrinsic versus extrinsic 686:combinatorial optimization 607: 3151:Computational linguistics 2599:Word-sense disambiguation 2452:Computational linguistics 2190:Andrew, Goldberg (2007). 1917:10.3103/S0005105507030041 1576:10.3103/S0005105510030027 1363:Zhai, ChengXiang (2016). 1289:10.1016/j.ins.2017.12.020 1160:10.1109/CVPR.2012.6247852 996:10.1109/tvcg.2019.2948611 699:facility location problem 325:understanding of the text 263:query-based summarization 69:"Automatic summarization" 3125:Natural Language Toolkit 3049:Pronunciation assessment 2951:Automatic identification 2781:Latent semantic analysis 2737:Distributional semantics 2622:Compound-term processing 2520:Named-entity recognition 2266:Mani, Inderjeet (2001). 1801:. Packt Publishing Ltd. 1227:10.1109/TIP.2016.2615289 961:. Wiley. pp. 320–. 891:latent semantic analysis 3029:Automated essay scoring 2999:Document classification 2666:Automatic summarization 2268:Automatic Summarization 2229:Summarizing Information 2192:Automatic Summarization 1768:"overview for autotldr" 1707:Hui Lin, Jeff Bilmes. " 1694:Hui Lin, Jeff Bilmes. " 1660:Hui Lin, Jeff Bilmes. " 1494:10.1023/A:1009976227802 1213:(12). IEEE: 5828–5840. 666:submodular set function 457:stationary distribution 295:Submodular set function 141:Artificial intelligence 133:Automatic summarization 18:Automatic summarisation 2886:Universal Dependencies 2579:Terminology extraction 2562:Semantic decomposition 2557:Semantic role labeling 2547:Part-of-speech tagging 2515:Information extraction 2500:Coreference resolution 2490:Collocation extraction 2247:Marcu, Daniel (2000). 1647:Zhu, Xiaojin, et al. " 1066:Li Tan; Yangqiu Song; 787:Adversarial stylometry 645:absorbing Markov chain 537:Adaptive summarization 527:Naive Bayes classifier 499:Document summarization 2647:Sentence segmentation 2143:Annie, Louis (2009). 1464:Information Retrieval 1283:. Elsevier: 319–331. 905:Recently the rise of 364:binary classification 329:information retrieval 255:generic summarization 3099:Voice user interface 2810:datasets and corpora 2751:Document-term matrix 2604:Word-sense induction 2285:Huff, Jason (2010). 2115:Anne, Buist (2004). 1277:Information Sciences 1040:. January 10, 2018. 893:(LSA) combined with 624:information overload 546:TextRank and LexRank 369:precision and recall 305:Keyphrase extraction 54:improve this article 3079:Interactive fiction 3009:Pachinko allocation 2966:Speech segmentation 2922:Google Ngram Viewer 2694:Machine translation 2684:Text simplification 2679:Sentence extraction 2567:Semantic similarity 1651:." HLT-NAACL. 2007. 1486:2002cs.......12020T 1219:2016ITIP...25.5828M 1084:10.1109/mcg.2011.89 1038:US Fed News Service 927:Sentence extraction 261:, sometimes called 240:Aided summarization 185:Commercial products 3089:Question answering 2961:Speech recognition 2826:Corpus linguistics 2806:Language resources 2589:Textual entailment 2572:Sentiment analysis 2041:. 24 February 2020 1753:2017-03-13 at the 1337:Accessed Dec 2019. 907:transformer models 853:Anaphor resolution 744:. You can help by 595:linear combination 340:keyword extraction 177:algorithms, where 3138: 3137: 3094:Virtual assistant 3019:Computer-assisted 2945: 2944: 2702:Computer-assisted 2660: 2659: 2652:Word segmentation 2614:Text segmentation 2552:Semantic analysis 2540:Syntactic parsing 2525:Ontology learning 2357:978-3-642-35785-5 2277:978-1-58811-060-2 2258:978-0-262-13372-2 2239:978-3-540-63735-6 2218:978-3-319-66938-0 1970:978-3-642-38325-0 1830:"What Is 'TLDR'?" 1448:978-3-319-66938-0 1376:978-1-970001-19-8 1169:978-1-4673-1228-8 1134:978-1-4398-5685-7 968:978-1-848-21668-6 901:Recent approaches 762: 761: 690:set cover problem 580:similarity scores 567:cosine similarity 408:genetic algorithm 312:research articles 130: 129: 122: 104: 16:(Redirected from 3173: 3115:Formal semantics 3064:Natural language 2971:Speech synthesis 2953:and data capture 2856:Semantic network 2831:Lexical resource 2814: 2632:Lexical analysis 2610: 2535:Semantic parsing 2404: 2397: 2390: 2381: 2375: 2369: 2361: 2333: 2327: 2319: 2317: 2304: 2292: 2281: 2262: 2243: 2222: 2195: 2186: 2180: 2172: 2170: 2169: 2160:. Archived from 2150: 2139: 2137: 2136: 2130: 2124:. Archived from 2123: 2111: 2100: 2082: 2080: 2060: 2056: 2050: 2049: 2047: 2046: 2031: 2025: 2024: 2014: 2005:(4): 1029–1046. 1990: 1984: 1981: 1975: 1974: 1948: 1942: 1935: 1929: 1928: 1900: 1894: 1889: 1883: 1882:, p. 11-12. 1877: 1871: 1870: 1868: 1866: 1851: 1845: 1844: 1842: 1840: 1826: 1820: 1819: 1817: 1815: 1789: 1783: 1782: 1780: 1778: 1764: 1758: 1744: 1738: 1731: 1725: 1718: 1712: 1705: 1699: 1692: 1686: 1683: 1677: 1670: 1664: 1658: 1652: 1645: 1639: 1632: 1626: 1619: 1613: 1605: 1599: 1594: 1588: 1587: 1559: 1553: 1552: 1542: 1534: 1532: 1531: 1525: 1512: 1506: 1505: 1479: 1459: 1453: 1452: 1426: 1420: 1409: 1403: 1402: 1396: 1388: 1360: 1354: 1347: 1338: 1331: 1325: 1324: 1322: 1321: 1306: 1300: 1299: 1297: 1295: 1268: 1262: 1261: 1259: 1257: 1204: 1195: 1189: 1188: 1186: 1184: 1145: 1139: 1138: 1118: 1112: 1111: 1063: 1057: 1056: 1054: 1052: 1030: 1024: 1023: 990:(4): 2298–2312. 979: 973: 972: 952: 757: 754: 736: 729: 711:greedy algorithm 620:news aggregators 333:full-text search 125: 118: 114: 111: 105: 103: 62: 38: 30: 21: 3181: 3180: 3176: 3175: 3174: 3172: 3171: 3170: 3141: 3140: 3139: 3134: 3103: 3083:Syntax guessing 3065: 3058: 3044:Predictive text 3039:Grammar checker 3020: 3013: 2985: 2952: 2941: 2907:Bank of English 2890: 2818: 2809: 2800: 2731: 2688: 2656: 2608: 2510:Distant reading 2485:Argument mining 2471: 2467:Text processing 2413: 2408: 2362: 2358: 2337: 2320: 2315: 2308: 2296: 2284: 2278: 2265: 2259: 2246: 2240: 2225: 2219: 2198: 2189: 2173: 2167: 2165: 2153: 2142: 2134: 2132: 2128: 2121: 2114: 2103: 2092: 2089: 2087:Further reading 2078: 2071: 2068: 2063: 2057: 2053: 2044: 2042: 2033: 2032: 2028: 1992: 1991: 1987: 1982: 1978: 1971: 1950: 1949: 1945: 1936: 1932: 1902: 1901: 1897: 1890: 1886: 1878: 1874: 1864: 1862: 1861:. 29 March 2012 1853: 1852: 1848: 1838: 1836: 1828: 1827: 1823: 1813: 1811: 1809: 1791: 1790: 1786: 1776: 1774: 1766: 1765: 1761: 1755:Wayback Machine 1745: 1741: 1732: 1728: 1719: 1715: 1706: 1702: 1693: 1689: 1684: 1680: 1671: 1667: 1659: 1655: 1646: 1642: 1633: 1629: 1620: 1616: 1606: 1602: 1595: 1591: 1561: 1560: 1556: 1535: 1529: 1527: 1523: 1521:"Archived copy" 1519: 1513: 1509: 1461: 1460: 1456: 1449: 1428: 1427: 1423: 1410: 1406: 1389: 1377: 1362: 1361: 1357: 1348: 1341: 1332: 1328: 1319: 1317: 1316:. 23 March 2022 1308: 1307: 1303: 1293: 1291: 1270: 1269: 1265: 1255: 1253: 1202: 1197: 1196: 1192: 1182: 1180: 1170: 1147: 1146: 1142: 1135: 1120: 1119: 1115: 1065: 1064: 1060: 1050: 1048: 1032: 1031: 1027: 981: 980: 976: 969: 954: 953: 949: 945: 923: 903: 883:Hans Peter Luhn 879: 870: 861: 816: 807: 796: 758: 752: 749: 742:needs expansion 727: 662: 636: 612: 606: 548: 539: 531:maximum entropy 519: 510: 501: 463:on the graph). 430:social networks 417: 348: 307: 251: 242: 224: 215: 199: 187: 158:computer vision 126: 115: 109: 106: 63: 61: 51: 39: 28: 23: 22: 15: 12: 11: 5: 3179: 3177: 3169: 3168: 3163: 3158: 3153: 3143: 3142: 3136: 3135: 3133: 3132: 3127: 3122: 3117: 3111: 3109: 3105: 3104: 3102: 3101: 3096: 3091: 3086: 3076: 3070: 3068: 3066:user interface 3060: 3059: 3057: 3056: 3051: 3046: 3041: 3036: 3031: 3025: 3023: 3015: 3014: 3012: 3011: 3006: 3001: 2995: 2993: 2987: 2986: 2984: 2983: 2978: 2973: 2968: 2963: 2957: 2955: 2947: 2946: 2943: 2942: 2940: 2939: 2934: 2929: 2924: 2919: 2914: 2909: 2904: 2898: 2896: 2892: 2891: 2889: 2888: 2883: 2878: 2873: 2868: 2863: 2858: 2853: 2848: 2843: 2838: 2833: 2828: 2822: 2820: 2811: 2802: 2801: 2799: 2798: 2793: 2791:Word embedding 2788: 2783: 2778: 2771:Language model 2768: 2763: 2758: 2753: 2748: 2742: 2740: 2733: 2732: 2730: 2729: 2724: 2722:Transfer-based 2719: 2714: 2709: 2704: 2698: 2696: 2690: 2689: 2687: 2686: 2681: 2676: 2670: 2668: 2662: 2661: 2658: 2657: 2655: 2654: 2649: 2644: 2639: 2634: 2629: 2624: 2618: 2616: 2607: 2606: 2601: 2596: 2591: 2586: 2581: 2575: 2574: 2569: 2564: 2559: 2554: 2549: 2544: 2543: 2542: 2537: 2527: 2522: 2517: 2512: 2507: 2502: 2497: 2495:Concept mining 2492: 2487: 2481: 2479: 2473: 2472: 2470: 2469: 2464: 2459: 2454: 2449: 2448: 2447: 2442: 2432: 2427: 2421: 2419: 2415: 2414: 2409: 2407: 2406: 2399: 2392: 2384: 2378: 2377: 2356: 2335: 2306: 2294: 2282: 2276: 2263: 2257: 2244: 2238: 2223: 2217: 2196: 2187: 2151: 2140: 2112: 2101: 2088: 2085: 2084: 2083: 2067: 2064: 2062: 2061: 2051: 2039:Google AI Blog 2026: 1985: 1976: 1969: 1943: 1930: 1895: 1884: 1872: 1846: 1821: 1807: 1795:(2016-08-29). 1784: 1759: 1739: 1726: 1713: 1700: 1687: 1678: 1665: 1653: 1640: 1627: 1614: 1600: 1589: 1570:(3): 111–120. 1554: 1507: 1470:(4): 303–336. 1454: 1447: 1421: 1404: 1375: 1355: 1339: 1326: 1314:Google AI Blog 1301: 1263: 1190: 1168: 1140: 1133: 1113: 1058: 1025: 974: 967: 946: 944: 941: 940: 939: 934: 929: 922: 919: 902: 899: 878: 875: 869: 866: 860: 857: 826:and coverage. 815: 812: 806: 803: 795: 792: 791: 790: 784: 781:Internet slang 760: 759: 739: 737: 726: 723: 678:representation 664:The idea of a 661: 658: 635: 632: 608:Main article: 605: 602: 547: 544: 538: 535: 518: 515: 509: 506: 500: 497: 416: 413: 375: = 2 347: 344: 320: 319: 306: 303: 250: 247: 241: 238: 223: 220: 214: 211: 198: 195: 186: 183: 175:video synopsis 128: 127: 42: 40: 33: 26: 24: 14: 13: 10: 9: 6: 4: 3: 2: 3178: 3167: 3164: 3162: 3159: 3157: 3154: 3152: 3149: 3148: 3146: 3131: 3128: 3126: 3123: 3121: 3120:Hallucination 3118: 3116: 3113: 3112: 3110: 3106: 3100: 3097: 3095: 3092: 3090: 3087: 3084: 3080: 3077: 3075: 3072: 3071: 3069: 3067: 3061: 3055: 3054:Spell checker 3052: 3050: 3047: 3045: 3042: 3040: 3037: 3035: 3032: 3030: 3027: 3026: 3024: 3022: 3016: 3010: 3007: 3005: 3002: 3000: 2997: 2996: 2994: 2992: 2988: 2982: 2979: 2977: 2974: 2972: 2969: 2967: 2964: 2962: 2959: 2958: 2956: 2954: 2948: 2938: 2935: 2933: 2930: 2928: 2925: 2923: 2920: 2918: 2915: 2913: 2910: 2908: 2905: 2903: 2900: 2899: 2897: 2893: 2887: 2884: 2882: 2879: 2877: 2874: 2872: 2869: 2867: 2866:Speech corpus 2864: 2862: 2859: 2857: 2854: 2852: 2849: 2847: 2846:Parallel text 2844: 2842: 2839: 2837: 2834: 2832: 2829: 2827: 2824: 2823: 2821: 2815: 2812: 2807: 2803: 2797: 2794: 2792: 2789: 2787: 2784: 2782: 2779: 2776: 2772: 2769: 2767: 2764: 2762: 2759: 2757: 2754: 2752: 2749: 2747: 2744: 2743: 2741: 2738: 2734: 2728: 2725: 2723: 2720: 2718: 2715: 2713: 2710: 2708: 2707:Example-based 2705: 2703: 2700: 2699: 2697: 2695: 2691: 2685: 2682: 2680: 2677: 2675: 2672: 2671: 2669: 2667: 2663: 2653: 2650: 2648: 2645: 2643: 2640: 2638: 2637:Text chunking 2635: 2633: 2630: 2628: 2627:Lemmatisation 2625: 2623: 2620: 2619: 2617: 2615: 2611: 2605: 2602: 2600: 2597: 2595: 2592: 2590: 2587: 2585: 2582: 2580: 2577: 2576: 2573: 2570: 2568: 2565: 2563: 2560: 2558: 2555: 2553: 2550: 2548: 2545: 2541: 2538: 2536: 2533: 2532: 2531: 2528: 2526: 2523: 2521: 2518: 2516: 2513: 2511: 2508: 2506: 2503: 2501: 2498: 2496: 2493: 2491: 2488: 2486: 2483: 2482: 2480: 2478: 2477:Text analysis 2474: 2468: 2465: 2463: 2460: 2458: 2455: 2453: 2450: 2446: 2443: 2441: 2438: 2437: 2436: 2433: 2431: 2428: 2426: 2423: 2422: 2420: 2418:General terms 2416: 2412: 2405: 2400: 2398: 2393: 2391: 2386: 2385: 2382: 2373: 2367: 2359: 2353: 2349: 2345: 2341: 2336: 2331: 2325: 2314: 2313: 2307: 2302: 2301: 2295: 2290: 2289: 2288:AutoSummarize 2283: 2279: 2273: 2269: 2264: 2260: 2254: 2251:. MIT Press. 2250: 2245: 2241: 2235: 2231: 2230: 2224: 2220: 2214: 2210: 2206: 2202: 2197: 2193: 2188: 2184: 2178: 2164:on 2018-10-03 2163: 2159: 2158: 2152: 2148: 2147: 2141: 2131:on 2021-01-23 2127: 2120: 2119: 2113: 2109: 2108: 2102: 2098: 2097: 2091: 2090: 2086: 2077: 2076: 2070: 2069: 2065: 2055: 2052: 2040: 2036: 2030: 2027: 2022: 2018: 2013: 2008: 2004: 2000: 1996: 1989: 1986: 1980: 1977: 1972: 1966: 1962: 1958: 1954: 1947: 1944: 1940: 1934: 1931: 1926: 1922: 1918: 1914: 1911:(3): 93–103. 1910: 1906: 1899: 1896: 1893: 1888: 1885: 1881: 1876: 1873: 1860: 1856: 1850: 1847: 1835: 1831: 1825: 1822: 1810: 1808:9781785885914 1804: 1800: 1799: 1794: 1793:Squire, Megan 1788: 1785: 1773: 1769: 1763: 1760: 1756: 1752: 1749: 1743: 1740: 1736: 1730: 1727: 1723: 1717: 1714: 1710: 1704: 1701: 1697: 1691: 1688: 1682: 1679: 1675: 1669: 1666: 1663: 1657: 1654: 1650: 1644: 1641: 1637: 1631: 1628: 1624: 1618: 1615: 1612: 1611: 1604: 1601: 1598: 1593: 1590: 1585: 1581: 1577: 1573: 1569: 1565: 1558: 1555: 1550: 1546: 1540: 1522: 1517: 1511: 1508: 1503: 1499: 1495: 1491: 1487: 1483: 1478: 1473: 1469: 1465: 1458: 1455: 1450: 1444: 1440: 1436: 1432: 1425: 1422: 1419: 1415: 1408: 1405: 1400: 1394: 1386: 1382: 1378: 1372: 1368: 1367: 1359: 1356: 1352: 1346: 1344: 1340: 1336: 1330: 1327: 1315: 1311: 1305: 1302: 1290: 1286: 1282: 1278: 1274: 1267: 1264: 1252: 1248: 1244: 1240: 1236: 1232: 1228: 1224: 1220: 1216: 1212: 1208: 1201: 1194: 1191: 1179: 1175: 1171: 1165: 1161: 1157: 1153: 1152: 1144: 1141: 1136: 1130: 1126: 1125: 1117: 1114: 1109: 1105: 1101: 1097: 1093: 1089: 1085: 1081: 1077: 1073: 1069: 1062: 1059: 1047: 1043: 1039: 1035: 1029: 1026: 1021: 1017: 1013: 1009: 1005: 1001: 997: 993: 989: 985: 978: 975: 970: 964: 960: 959: 951: 948: 942: 938: 935: 933: 930: 928: 925: 924: 920: 918: 916: 912: 908: 900: 898: 896: 892: 888: 884: 876: 874: 867: 865: 858: 856: 854: 849: 846: 843: 839: 836: 832: 827: 825: 819: 813: 811: 804: 802: 799: 793: 788: 785: 782: 778: 774: 771: 767: 766: 765: 756: 753:February 2017 747: 743: 740:This section 738: 735: 731: 730: 724: 722: 718: 714: 712: 707: 704: 700: 695: 691: 687: 683: 679: 675: 671: 667: 659: 657: 653: 649: 646: 640: 633: 631: 627: 625: 621: 616: 611: 603: 601: 598: 596: 592: 587: 583: 581: 576: 572: 568: 563: 560: 556: 554: 545: 543: 536: 534: 532: 528: 524: 516: 514: 507: 505: 498: 496: 492: 488: 484: 482: 477: 473: 472:co-occurrence 468: 464: 462: 458: 455:1 (i.e., the 454: 449: 446: 442: 438: 433: 431: 427: 423: 422:training data 414: 412: 409: 405: 400: 396: 392: 388: 386: 383: + 382: 378: 374: 370: 365: 361: 357: 353: 345: 343: 341: 336: 334: 330: 326: 317: 316: 315: 313: 304: 302: 300: 296: 292: 287: 284: 278: 276: 272: 266: 264: 260: 256: 248: 246: 239: 237: 234: 230: 221: 219: 212: 210: 208: 204: 196: 194: 192: 184: 182: 180: 176: 172: 168: 163: 159: 155: 151: 147: 145: 142: 138: 134: 124: 121: 113: 102: 99: 95: 92: 88: 85: 81: 78: 74: 71: – 70: 66: 65:Find sources: 59: 55: 49: 48: 43:This article 41: 37: 32: 31: 19: 3034:Concordancer 2665: 2430:Bag-of-words 2339: 2311: 2299: 2287: 2267: 2248: 2232:. Springer. 2228: 2200: 2191: 2166:. Retrieved 2162:the original 2156: 2145: 2133:. Retrieved 2126:the original 2117: 2106: 2095: 2074: 2054: 2043:. Retrieved 2038: 2029: 2002: 1998: 1988: 1979: 1952: 1946: 1933: 1908: 1904: 1898: 1887: 1875: 1863:. Retrieved 1858: 1849: 1837:. Retrieved 1833: 1824: 1812:. Retrieved 1797: 1787: 1775:. Retrieved 1771: 1762: 1742: 1729: 1716: 1703: 1698:", UAI, 2012 1690: 1681: 1668: 1656: 1643: 1630: 1617: 1608: 1603: 1592: 1567: 1563: 1557: 1528:. Retrieved 1515: 1510: 1467: 1463: 1457: 1430: 1424: 1407: 1365: 1358: 1329: 1318:. Retrieved 1313: 1304: 1292:. Retrieved 1280: 1276: 1266: 1254:. Retrieved 1210: 1206: 1193: 1181:. Retrieved 1150: 1143: 1123: 1116: 1078:(1): 46–55. 1075: 1071: 1061: 1049:. Retrieved 1037: 1028: 987: 983: 977: 957: 950: 904: 880: 871: 862: 850: 847: 828: 820: 817: 808: 800: 797: 763: 750: 746:adding to it 741: 725:Applications 719: 715: 708: 693: 681: 677: 673: 669: 663: 654: 650: 641: 637: 628: 614: 613: 599: 588: 584: 582:as weights. 564: 561: 557: 549: 540: 520: 511: 502: 493: 489: 485: 469: 465: 434: 418: 401: 397: 393: 389: 384: 380: 376: 372: 349: 337: 321: 308: 290: 288: 279: 267: 262: 258: 254: 252: 243: 229:paraphrasing 225: 216: 200: 188: 178: 170: 166: 160:algorithms. 148: 132: 131: 116: 107: 97: 90: 83: 76: 64: 52:Please help 47:verification 44: 3166:Data mining 2991:Topic model 2871:Text corpus 2717:Statistical 2584:Text mining 2425:AI-complete 2066:Works cited 1051:January 22, 932:Text mining 868:Qualitative 674:information 461:random walk 404:Naive Bayes 207:abstraction 191:Google Docs 3145:Categories 2712:Rule-based 2594:Truecasing 2462:Stop words 2168:2018-10-03 2135:2020-07-19 2045:2022-04-03 1865:9 February 1839:9 February 1814:9 February 1777:9 February 1530:2012-07-20 1477:cs/0212020 1320:2022-04-03 1294:4 December 1256:4 December 1183:4 December 1068:Shixia Liu 1046:1986931333 943:References 794:Evaluation 575:normalized 453:eigenvalue 448:similarity 203:extraction 197:Approaches 167:key-frames 144:algorithms 110:April 2022 80:newspapers 3021:reviewing 2819:standards 2817:Types and 2366:cite book 2324:cite book 2177:cite book 2021:1319-1578 1393:cite book 1385:957355971 1092:0272-1716 1020:204865221 1004:1077-2626 824:coherence 682:diversity 634:Diversity 171:key-shots 2937:Wikidata 2917:FrameNet 2902:BabelNet 2881:Treebank 2851:PropBank 2796:Word2vec 2761:fastText 2642:Stemming 1834:Lifewire 1751:Archived 1539:cite web 1251:18566122 1243:28113502 1100:24808292 1042:ProQuest 1012:31647438 921:See also 670:coverage 553:centroid 481:cohesion 476:unigrams 426:PageRank 291:core-set 189:In 2022 3108:Related 3074:Chatbot 2932:WordNet 2912:DBpedia 2786:Seq2seq 2530:Parsing 2445:Trigram 1925:7853204 1584:1586931 1502:7007323 1482:Bibcode 1215:Bibcode 1178:5909301 1108:7668289 877:History 459:of the 445:lexical 356:unigram 271:cluster 137:summary 94:scholar 3081:(c.f. 2739:models 2727:Neural 2440:Bigram 2435:n-gram 2354: 2274: 2255: 2236: 2215: 2019: 1967: 1923: 1805: 1772:reddit 1582: 1500: 1445: 1383: 1373: 1249: 1241: 1176: 1166: 1131: 1106: 1098: 1090: 1044: 1018: 1010: 1002: 965: 842:n-gram 770:Reddit 571:TF-IDF 360:bigram 96: 89: 82: 75: 67: 3130:spaCy 2775:large 2766:GloVe 2316:(PDF) 2129:(PDF) 2122:(PDF) 2079:(PDF) 2059:PMLR. 1921:S2CID 1580:S2CID 1524:(PDF) 1498:S2CID 1472:arXiv 1247:S2CID 1203:(PDF) 1174:S2CID 1104:S2CID 1016:S2CID 831:ROUGE 777:TL;DR 694:cover 437:graph 162:Image 101:JSTOR 87:books 2895:Data 2746:BERT 2372:link 2352:ISBN 2330:link 2272:ISBN 2253:ISBN 2234:ISBN 2213:ISBN 2183:link 2017:ISSN 1965:ISBN 1867:2017 1841:2017 1816:2017 1803:ISBN 1779:2017 1549:link 1545:link 1443:ISBN 1399:link 1381:OCLC 1371:ISBN 1296:2022 1258:2022 1239:PMID 1185:2022 1164:ISBN 1129:ISBN 1096:PMID 1088:ISSN 1053:2021 1008:PMID 1000:ISSN 963:ISBN 915:LSTM 835:NIST 768:The 680:and 591:MEAD 205:and 150:Text 73:news 2927:UBY 2344:doi 2205:doi 2007:doi 1957:doi 1913:doi 1572:doi 1490:doi 1435:doi 1414:doi 1285:doi 1281:432 1231:hdl 1223:doi 1156:doi 1080:doi 992:doi 911:RNN 773:bot 748:. 569:of 523:TNO 441:NLP 179:new 56:by 3147:: 2368:}} 2364:{{ 2350:. 2326:}} 2322:{{ 2270:. 2211:. 2179:}} 2175:{{ 2037:. 2015:. 2003:34 2001:. 1997:. 1963:. 1919:. 1909:41 1907:. 1857:. 1832:. 1770:. 1578:. 1568:44 1566:. 1541:}} 1537:{{ 1496:. 1488:. 1480:. 1466:. 1441:. 1395:}} 1391:{{ 1379:. 1342:^ 1312:. 1279:. 1275:. 1245:. 1237:. 1229:. 1221:. 1211:25 1209:. 1205:. 1172:. 1162:. 1102:. 1094:. 1086:. 1076:32 1074:. 1036:. 1014:. 1006:. 998:. 988:27 986:. 779:− 676:, 672:, 379:/( 377:PR 358:, 297:, 209:. 3085:) 2808:, 2777:) 2773:( 2403:e 2396:t 2389:v 2374:) 2360:. 2346:: 2332:) 2318:. 2291:. 2280:. 2261:. 2242:. 2221:. 2207:: 2194:. 2185:) 2171:. 2149:. 2138:. 2110:. 2099:. 2048:. 2023:. 2009:: 1973:. 1959:: 1927:. 1915:: 1869:. 1843:. 1818:. 1781:. 1621:" 1586:. 1574:: 1551:) 1533:. 1504:. 1492:: 1484:: 1474:: 1468:2 1451:. 1437:: 1416:: 1401:) 1387:. 1323:. 1298:. 1287:: 1260:. 1233:: 1225:: 1217:: 1187:. 1158:: 1137:. 1110:. 1082:: 1055:. 1022:. 994:: 971:. 913:( 755:) 751:( 551:" 385:R 381:P 373:F 123:) 117:( 112:) 108:( 98:· 91:· 84:· 77:· 50:. 20:)

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

Index