106:: The classification is done according to some ideals and reflects the purpose of the library or database doing the classification. In this way it is not necessarily a kind of classification or indexing based on user studies. Only if empirical data about use or users are applied should request-oriented classification be regarded as a user-based approach.
98:(or -indexing) is classification in which the anticipated request from users is influencing how documents are being classified. The classifier asks themself: “Under which descriptors should this entity be found?” and “think of all the possible queries and decide for which ones the entity at hand is relevant” (Soergel, 1985, p. 230).
65:. The intellectual classification of documents has mostly been the province of library science, while the algorithmic classification of documents is mainly in information science and computer science. The problems are overlapping, however, and there is therefore interdisciplinary research on document classification.
126:
has argued, this distinction is not fruitful. "These terminological distinctions,” he writes, “are quite meaningless and only serve to cause confusion” (Lancaster, 2003, p. 21). The view that this distinction is purely superficial is also supported by the fact that a classification system may be
101:
Request-oriented classification may be classification that is targeted towards a particular audience or user group. For example, a library or a database for feminist studies may classify/index documents differently when compared to a historical library. It is probably better, however, to understand
135:
to a document) is at the same time to assign that document to the class of documents indexed by that term (all documents indexed or classified as X belong to the same class of documents). In other words, labeling a document is the same as assigning it to the class of documents indexed under that
91:
is classification in which the weight given to particular subjects in a document determines the class to which the document is assigned. It is, for example, a common rule for classification in libraries, that at least 20% of the content of a book should be about the class to which the book is
79:
or according to other attributes (such as document type, author, printing year etc.). In the rest of this article only subject classification is considered. There are two main philosophies of subject classification of documents: the content-based approach and the request-based approach.
480:
Aitchison, J. (1986). "A classification as a source for thesaurus: The
Bibliographic Classification of H. E. Bliss as a source of thesaurus terms and structure." Journal of Documentation, Vol. 42 No. 3, pp.
448:
Library of
Congress (2008). The subject headings manual. Washington, DC.: Library of Congress, Policy and Standards Division. (Sheet H 180: "Assign headings only for topics that comprise at least 20% of the
700:
X. Dai, M. Bikdash and B. Meyer, "From social media to public health surveillance: Word embedding based clustering method for twitter classification," SoutheastCon 2017, Charlotte, NC, 2017, pp. 1-7.
490:
Aitchison, J. (2004). "Thesauri from BC2: Problems and possibilities revealed in an experimental thesaurus derived from the Bliss Music schedule." Bliss
Classification Bulletin, Vol. 46, pp. 20-26.
310:
article triage, selecting articles that are relevant for manual literature curation, for example as is being done as the first step to generate manually curated annotation databases in biology
512:
Riesthuis, G. J. A., & Bliedung, St. (1991). "Thesaurification of the UDC." Tools for knowledge organization and the human interface, Vol. 2, pp. 109-117. Index Verlag, Frankfurt.
888:
1048:
131:
and vice versa (cf., Aitchison, 1986, 2004; Broughton, 2008; Riesthuis & Bliedung, 1991). Therefore, the act of labeling a document (say by assigning a term from a
68:
The documents to be classified may be texts, images, music, etc. Each kind of document possesses its special classification problems. When not otherwise specified,
294:, automatically determining the degree of readability of a text, either to find suitable materials for different age groups or reader types or as part of a larger
687:
501:
A faceted classification as the basis of a faceted terminology: Conversion of a classified structure to thesaurus format in the Bliss
Bibliographic Classification
1026:
644:. In Sergei Nirenburg, Douglas Appelt, Fabio Ciravegna and Robert Dale, eds., Proc. 6th Applied Natural Language Processing Conf. (ANLP'00), pp. 158–165, ACL.
535:
1437:
881:
803:
1606:
201:
160:, where parts of the documents are labeled by the external mechanism. There are several software products under various license models available.
823:
568:
1657:
1347:
1038:
874:
811:
1601:
1208:
1647:
1362:
1193:
1133:
655:
583:
304:, determining the attitude of a speaker or a writer with respect to some topic or the overall contextual polarity of a document.
1550:
1203:
326:
1198:
943:
391:
1467:
1188:
376:
341:
148:
where some external mechanism (such as human feedback) provides information on the correct classification for documents,
1642:
1160:
336:
245:
1505:
1490:
1462:
1327:
1322:
897:
617:
523:
Optimization and label propagation in bipartite heterogeneous networks to improve transductive classification of texts
221:
123:
1242:
1213:
991:
211:
1652:
1085:
938:
542:
346:
172:
1611:
1535:
1267:
1223:
1108:
1006:
331:
206:
195:
847:
786:
1515:
1152:
282:
239:
114:
Sometimes a distinction is made between assigning documents to classes ("classification") versus assigning
1372:
1065:
1043:
1033:
1001:
976:
381:
371:
216:
182:
663:, BCS IRSG Symposium: Future Directions in Information Access, London, UK, pp. 54–63, archived from
92:
assigned. In automatic classification it could be the number of times given words appears in a document.
1232:
415:
366:
132:
820:
522:
471:
Lancaster, F. W. (2003). Indexing and abstracting in theory and practice. Library
Association, London.
1637:
1585:
1261:
1237:
1090:
1565:
1495:
1452:
1408:
1180:
1170:
1165:
1053:
594:
565:
411:
401:
361:
295:
153:
115:
76:
42:
1575:
1447:
1312:
1075:
1058:
916:
356:
301:
156:), where the classification must be done entirely without reference to external information, and
54:
808:
279:, sending an email sent to a general address to a specific address or mailbox depending on topic
1580:
1292:
1100:
1011:
754:
681:
500:
1457:
1342:
1317:
1118:
1021:
744:
734:
701:
406:
386:
119:
46:
1569:
1530:
1525:
1393:
1123:
996:
971:
953:
851:
827:
815:
790:
572:
62:
38:
832:
861:
856:
1277:
1257:
981:
839:
Learning to
Classify Text - Chap. 6 of the book Natural Language Processing with Python
749:
718:
428:
396:
321:
190:
177:
58:
1631:
1540:
1352:
1332:
1113:
664:
186:
1520:
17:
459:
1477:
1357:
1070:
986:
963:
911:
420:
291:
269:
265:
719:"Overview of the protein-protein interaction annotation extraction task of Bio
705:
307:
health-related classification using social media in public health surveillance
250:
1080:
866:
424:
948:
739:
227:
128:
758:
1423:
1403:
1388:
1367:
1337:
1282:
1247:
1128:
351:
233:
144:
Automatic document classification tasks can be divided into three sorts:
50:
1560:
1418:
1398:
1272:
1016:
931:
276:
776:
926:
921:
838:
717:
Krallinger, M; Leitner, F; Rodriguez-Penagos, C; Valencia, A (2008).
605:
460:
Organizing information: Principles of data base and retrieval systems
783:
641:
288:
genre classification, automatically determining the genre of a text
1616:
1252:
844:
784:
Information
Retrieval: Implementing and Evaluating Search Engines
1138:
870:
1413:
782:
Stefan BĂĽttcher, Charles L. A. Clarke, and Gordon V. Cormack.
657:
Testing a Genre-Enabled
Application: A Preliminary Assessment
845:
TechTC - Technion
Repository of Text Categorization Datasets
536:"An Interactive Automatic Document Classification Prototype"
640:
Stephan
Busemann, Sven Schmeier and Roman G. Arens (2000).
521:
Rossi, R. G., Lopes, A. d. A., and Rezende, S. O. (2016).
862:
BioCreative III ACT (article classification task) dataset
525:. Information Processing & Management, 52(2):217–257.
566:
Interactive Automatic Document Classification Prototype
61:. This may be done "manually" (or "intellectually") or
618:"3 Document Classification Methods for Tough Projects"
168:
Automatic document classification techniques include:
84:"Content-based" versus "request-based" classification
1594:
1549:
1504:
1476:
1436:
1381:
1303:
1291:
1222:
1179:
1151:
1099:
962:
904:
503:(2nd Ed.).]" Axiomathes, Vol. 18 No.2, pp. 193-210.
285:, automatically determining the language of a text
777:Machine learning in automated text categorization
261:Classification techniques have been applied to
75:Documents may be classified according to their
882:
809:Bibliography on Automated Text Categorization
8:
1300:
1096:
889:
875:
867:
779:. ACM Computing Surveys, 34(1):1–47, 2002.
686:: CS1 maint: location missing publisher (
748:
738:
642:Message classification in the call center
804:Introduction to document classification
441:
202:Instantaneously trained neural networks
158:semi-supervised document classification
140:Automatic document classification (ADC)
679:
595:ABBYY FineReader Engine 11 for Windows
654:Santini, Marina; Rosso, Mark (2008),
7:
1348:Simple Knowledge Organization System
821:Bibliography on Query Classification
150:unsupervised document classification
268:, a process which tries to discern
102:request-oriented classification as
146:supervised document classification
25:
1363:Thesaurus (information retrieval)
27:Process of categorizing documents
584:Document Classification - Artsyl
327:Classification (disambiguation)
272:messages from legitimate emails
96:Request-oriented classification
944:Natural language understanding
462:. Orlando, FL: Academic Press.
392:Native Language Identification
246:K-nearest neighbour algorithms
110:Classification versus indexing
1:
1468:Optical character recognition
377:Knowledge Organization System
342:Content-based image retrieval
1161:Multi-document summarization
337:Concept-based image indexing
89:Content-based classification
1658:Natural language processing
1491:Latent Dirichlet allocation
1463:Natural language generation
1328:Machine-readable dictionary
1323:Linguistic Linked Open Data
898:Natural language processing
222:Natural language processing
124:Frederick Wilfrid Lancaster
104:policy-based classification
1674:
1243:Explicit semantic analysis
992:Deep linguistic processing
706:10.1109/SECON.2017.7925400
458:Soergel, Dagobert (1985).
212:Multiple-instance learning
49:. The task is to assign a
1086:Word-sense disambiguation
939:Computational linguistics
857:David D. Lewis's Datasets
347:Decimal section numbering
173:Artificial neural network
1648:Knowledge representation
1612:Natural Language Toolkit
1536:Pronunciation assessment
1438:Automatic identification
1268:Latent semantic analysis
1224:Distributional semantics
1109:Compound-term processing
1007:Named-entity recognition
332:Compound term processing
207:Latent semantic indexing
196:Expectation maximization
1516:Automated essay scoring
1486:Document classification
1153:Automatic summarization
740:10.1186/gb-2008-9-s2-s4
571:April 24, 2015, at the
499:Broughton, V. (2008). "
283:language identification
240:Support vector machines
35:document categorization
31:Document classification
1373:Universal Dependencies
1066:Terminology extraction
1049:Semantic decomposition
1044:Semantic role labeling
1034:Part-of-speech tagging
1002:Information extraction
987:Coreference resolution
977:Collocation extraction
382:Library classification
372:Knowledge organization
292:readability assessment
217:Naive Bayes classifier
1134:Sentence segmentation
775:Fabrizio Sebastiani.
416:unsupervised learning
367:Information retrieval
133:controlled vocabulary
1586:Voice user interface
1297:datasets and corpora
1238:Document-term matrix
1091:Word-sense induction
606:Classifier - Antidot
1643:Information science
1566:Interactive fiction
1496:Pachinko allocation
1453:Speech segmentation
1409:Google Ngram Viewer
1181:Machine translation
1171:Text simplification
1166:Sentence extraction
1054:Semantic similarity
833:Text Classification
412:Supervised learning
402:Subject (documents)
362:Document clustering
296:text simplification
154:document clustering
127:transformed into a
70:text classification
43:information science
18:Text Classification
1576:Question answering
1448:Speech recognition
1313:Corpus linguistics
1293:Language resources
1076:Textual entailment
1059:Sentiment analysis
850:2020-02-14 at the
841:(available online)
826:2019-10-02 at the
814:2019-09-26 at the
793:. MIT Press, 2010.
789:2020-10-05 at the
357:Document retrieval
302:sentiment analysis
1625:
1624:
1581:Virtual assistant
1506:Computer-assisted
1432:
1431:
1189:Computer-assisted
1147:
1146:
1139:Word segmentation
1101:Text segmentation
1039:Semantic analysis
1027:Syntactic parsing
1012:Ontology learning
236:-based classifier
230:-based classifier
16:(Redirected from
1665:
1653:Machine learning
1602:Formal semantics
1551:Natural language
1458:Speech synthesis
1440:and data capture
1343:Semantic network
1318:Lexical resource
1301:
1119:Lexical analysis
1097:
1022:Semantic parsing
891:
884:
877:
868:
763:
762:
752:
742:
714:
708:
698:
692:
691:
685:
677:
676:
675:
669:
662:
651:
645:
638:
632:
631:
629:
628:
614:
608:
603:
597:
592:
586:
581:
575:
563:
557:
556:
554:
553:
547:
541:. Archived from
540:
532:
526:
519:
513:
510:
504:
497:
491:
488:
482:
478:
472:
469:
463:
456:
450:
446:
407:Subject indexing
387:Machine learning
120:subject indexing
47:computer science
37:is a problem in
21:
1673:
1672:
1668:
1667:
1666:
1664:
1663:
1662:
1628:
1627:
1626:
1621:
1590:
1570:Syntax guessing
1552:
1545:
1531:Predictive text
1526:Grammar checker
1507:
1500:
1472:
1439:
1428:
1394:Bank of English
1377:
1305:
1296:
1287:
1218:
1175:
1143:
1095:
997:Distant reading
972:Argument mining
958:
954:Text processing
900:
895:
852:Wayback Machine
828:Wayback Machine
816:Wayback Machine
800:
791:Wayback Machine
772:
770:Further reading
767:
766:
733:(Suppl 2): S4.
716:
715:
711:
699:
695:
678:
673:
671:
667:
660:
653:
652:
648:
639:
635:
626:
624:
616:
615:
611:
604:
600:
593:
589:
582:
578:
573:Wayback Machine
564:
560:
551:
549:
545:
538:
534:
533:
529:
520:
516:
511:
507:
498:
494:
489:
485:
479:
475:
470:
466:
457:
453:
447:
443:
438:
433:
317:
259:
166:
152:(also known as
142:
118:to documents ("
112:
86:
63:algorithmically
53:to one or more
39:library science
28:
23:
22:
15:
12:
11:
5:
1671:
1669:
1661:
1660:
1655:
1650:
1645:
1640:
1630:
1629:
1623:
1622:
1620:
1619:
1614:
1609:
1604:
1598:
1596:
1592:
1591:
1589:
1588:
1583:
1578:
1573:
1563:
1557:
1555:
1553:user interface
1547:
1546:
1544:
1543:
1538:
1533:
1528:
1523:
1518:
1512:
1510:
1502:
1501:
1499:
1498:
1493:
1488:
1482:
1480:
1474:
1473:
1471:
1470:
1465:
1460:
1455:
1450:
1444:
1442:
1434:
1433:
1430:
1429:
1427:
1426:
1421:
1416:
1411:
1406:
1401:
1396:
1391:
1385:
1383:
1379:
1378:
1376:
1375:
1370:
1365:
1360:
1355:
1350:
1345:
1340:
1335:
1330:
1325:
1320:
1315:
1309:
1307:
1298:
1289:
1288:
1286:
1285:
1280:
1278:Word embedding
1275:
1270:
1265:
1258:Language model
1255:
1250:
1245:
1240:
1235:
1229:
1227:
1220:
1219:
1217:
1216:
1211:
1209:Transfer-based
1206:
1201:
1196:
1191:
1185:
1183:
1177:
1176:
1174:
1173:
1168:
1163:
1157:
1155:
1149:
1148:
1145:
1144:
1142:
1141:
1136:
1131:
1126:
1121:
1116:
1111:
1105:
1103:
1094:
1093:
1088:
1083:
1078:
1073:
1068:
1062:
1061:
1056:
1051:
1046:
1041:
1036:
1031:
1030:
1029:
1024:
1014:
1009:
1004:
999:
994:
989:
984:
982:Concept mining
979:
974:
968:
966:
960:
959:
957:
956:
951:
946:
941:
936:
935:
934:
929:
919:
914:
908:
906:
902:
901:
896:
894:
893:
886:
879:
871:
865:
864:
859:
854:
842:
836:
830:
818:
806:
799:
798:External links
796:
795:
794:
780:
771:
768:
765:
764:
727:Genome Biology
709:
693:
646:
633:
609:
598:
587:
576:
558:
527:
514:
505:
492:
483:
473:
464:
451:
440:
439:
437:
434:
432:
431:
429:concept mining
418:
409:
404:
399:
397:String metrics
394:
389:
384:
379:
374:
369:
364:
359:
354:
349:
344:
339:
334:
329:
324:
322:Categorization
318:
316:
313:
312:
311:
308:
305:
299:
289:
286:
280:
273:
266:spam filtering
258:
255:
254:
253:
248:
243:
237:
231:
225:
219:
214:
209:
204:
199:
193:
183:Decision trees
180:
178:Concept Mining
175:
165:
162:
141:
138:
111:
108:
85:
82:
26:
24:
14:
13:
10:
9:
6:
4:
3:
2:
1670:
1659:
1656:
1654:
1651:
1649:
1646:
1644:
1641:
1639:
1636:
1635:
1633:
1618:
1615:
1613:
1610:
1608:
1607:Hallucination
1605:
1603:
1600:
1599:
1597:
1593:
1587:
1584:
1582:
1579:
1577:
1574:
1571:
1567:
1564:
1562:
1559:
1558:
1556:
1554:
1548:
1542:
1541:Spell checker
1539:
1537:
1534:
1532:
1529:
1527:
1524:
1522:
1519:
1517:
1514:
1513:
1511:
1509:
1503:
1497:
1494:
1492:
1489:
1487:
1484:
1483:
1481:
1479:
1475:
1469:
1466:
1464:
1461:
1459:
1456:
1454:
1451:
1449:
1446:
1445:
1443:
1441:
1435:
1425:
1422:
1420:
1417:
1415:
1412:
1410:
1407:
1405:
1402:
1400:
1397:
1395:
1392:
1390:
1387:
1386:
1384:
1380:
1374:
1371:
1369:
1366:
1364:
1361:
1359:
1356:
1354:
1353:Speech corpus
1351:
1349:
1346:
1344:
1341:
1339:
1336:
1334:
1333:Parallel text
1331:
1329:
1326:
1324:
1321:
1319:
1316:
1314:
1311:
1310:
1308:
1302:
1299:
1294:
1290:
1284:
1281:
1279:
1276:
1274:
1271:
1269:
1266:
1263:
1259:
1256:
1254:
1251:
1249:
1246:
1244:
1241:
1239:
1236:
1234:
1231:
1230:
1228:
1225:
1221:
1215:
1212:
1210:
1207:
1205:
1202:
1200:
1197:
1195:
1194:Example-based
1192:
1190:
1187:
1186:
1184:
1182:
1178:
1172:
1169:
1167:
1164:
1162:
1159:
1158:
1156:
1154:
1150:
1140:
1137:
1135:
1132:
1130:
1127:
1125:
1124:Text chunking
1122:
1120:
1117:
1115:
1114:Lemmatisation
1112:
1110:
1107:
1106:
1104:
1102:
1098:
1092:
1089:
1087:
1084:
1082:
1079:
1077:
1074:
1072:
1069:
1067:
1064:
1063:
1060:
1057:
1055:
1052:
1050:
1047:
1045:
1042:
1040:
1037:
1035:
1032:
1028:
1025:
1023:
1020:
1019:
1018:
1015:
1013:
1010:
1008:
1005:
1003:
1000:
998:
995:
993:
990:
988:
985:
983:
980:
978:
975:
973:
970:
969:
967:
965:
964:Text analysis
961:
955:
952:
950:
947:
945:
942:
940:
937:
933:
930:
928:
925:
924:
923:
920:
918:
915:
913:
910:
909:
907:
905:General terms
903:
899:
892:
887:
885:
880:
878:
873:
872:
869:
863:
860:
858:
855:
853:
849:
846:
843:
840:
837:
835:analysis page
834:
831:
829:
825:
822:
819:
817:
813:
810:
807:
805:
802:
801:
797:
792:
788:
785:
781:
778:
774:
773:
769:
760:
756:
751:
746:
741:
736:
732:
728:
724:
722:
713:
710:
707:
703:
697:
694:
689:
683:
670:on 2019-11-15
666:
659:
658:
650:
647:
643:
637:
634:
623:
622:www.bisok.com
619:
613:
610:
607:
602:
599:
596:
591:
588:
585:
580:
577:
574:
570:
567:
562:
559:
548:on 2017-11-15
544:
537:
531:
528:
524:
518:
515:
509:
506:
502:
496:
493:
487:
484:
477:
474:
468:
465:
461:
455:
452:
445:
442:
435:
430:
426:
422:
419:
417:
413:
410:
408:
405:
403:
400:
398:
395:
393:
390:
388:
385:
383:
380:
378:
375:
373:
370:
368:
365:
363:
360:
358:
355:
353:
350:
348:
345:
343:
340:
338:
335:
333:
330:
328:
325:
323:
320:
319:
314:
309:
306:
303:
300:
297:
293:
290:
287:
284:
281:
278:
274:
271:
267:
264:
263:
262:
256:
252:
249:
247:
244:
241:
238:
235:
232:
229:
226:
223:
220:
218:
215:
213:
210:
208:
205:
203:
200:
197:
194:
192:
188:
184:
181:
179:
176:
174:
171:
170:
169:
163:
161:
159:
155:
151:
147:
139:
137:
134:
130:
125:
121:
117:
109:
107:
105:
99:
97:
93:
90:
83:
81:
78:
73:
71:
66:
64:
60:
56:
52:
48:
44:
40:
36:
32:
19:
1521:Concordancer
1485:
917:Bag-of-words
730:
726:
720:
712:
696:
672:, retrieved
665:the original
656:
649:
636:
625:. Retrieved
621:
612:
601:
590:
579:
561:
550:. Retrieved
543:the original
530:
517:
508:
495:
486:
476:
467:
454:
444:
260:
257:Applications
167:
157:
149:
145:
143:
113:
103:
100:
95:
94:
88:
87:
74:
72:is implied.
69:
67:
34:
30:
29:
1638:Data mining
1478:Topic model
1358:Text corpus
1204:Statistical
1071:Text mining
912:AI-complete
421:Text mining
270:E-mail spam
1632:Categories
1199:Rule-based
1081:Truecasing
949:Stop words
674:2011-10-21
627:2021-08-04
552:2017-11-14
436:References
425:web mining
224:approaches
164:Techniques
122:") but as
59:categories
1508:reviewing
1306:standards
1304:Types and
228:Rough set
129:thesaurus
1424:Wikidata
1404:FrameNet
1389:BabelNet
1368:Treebank
1338:PropBank
1283:Word2vec
1248:fastText
1129:Stemming
848:Archived
824:Archived
812:Archived
787:Archived
759:18834495
721:Creative
682:citation
569:Archived
481:160-181.
352:Document
315:See also
234:Soft set
185:such as
136:label.
116:subjects
77:subjects
51:document
1595:Related
1561:Chatbot
1419:WordNet
1399:DBpedia
1273:Seq2seq
1017:Parsing
932:Trigram
750:2559988
449:work.")
277:routing
55:classes
1568:(c.f.
1226:models
1214:Neural
927:Bigram
922:n-gram
757:
747:
298:system
275:email
251:tf–idf
1617:spaCy
1262:large
1253:GloVe
668:(PDF)
661:(PDF)
546:(PDF)
539:(PDF)
242:(SVM)
1382:Data
1233:BERT
755:PMID
688:link
198:(EM)
191:C4.5
45:and
1414:UBY
745:PMC
735:doi
723:II"
702:doi
189:or
187:ID3
57:or
33:or
1634::
753:.
743:.
729:.
725:.
684:}}
680:{{
620:.
427:,
423:,
414:,
41:,
1572:)
1295:,
1264:)
1260:(
890:e
883:t
876:v
761:.
737::
731:9
704::
690:)
630:.
555:.
20:)
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.