Knowledge (XXG)

Document processing

Source 📝

185:
Automatic document processing applies to a whole range of documents, whether structured or not. For instance, in the world of business and finance, technologies may be used to process paper-based invoices, forms, purchase orders, contracts, and currency bills. Financial institutions use intelligent
221:
If, from the 1980s onward, traditional computer vision algorithms were widely used to solve document processing problems, these have been gradually replaced by neural network technologies in the 2010s. However, traditional computer vision technologies are still used, sometimes in conjunction with
189:
In medicine, document processing methods have been developed to facilitate patient follow-up and streamline administrative procedures, in particular by digitizing medical or laboratory analysis reports. The goal is also to standardize medical databases. Algorithms are also directly used to assist
574: 135:
vice-president, Paul Strassman, expressed a critical opinion, saying that computers add rather than reduce the volume of paper in an office. It was said that the engineering and maintenance documents for an airplane weigh "more than the airplane itself".
105:
As an example of manual document processing, as relatively recent as 2007, document processing for "millions of visa and citizenship applications" was about use of "approximately 1,000 contract workers" working to "manage mail room and
252:. Sometimes, specific 2D scanners must also be developed to adapt to the size of the documents or for reasons of scanning ergonomics. The document processing also depends on the digital encoding of the documents in a suitable 501:"Intelligent Document Processing" in Proceedings. Eighth International Conference on Document Analysis and Recognition, Seoul, South Korea, 2005 pp. 1100-1104. doi: 10.1109/ICDAR.2005.144 186:
document processing to process high volumes of forms such as regulatory forms or loan documents. ID uses AI to extract and classify data from documents, replacing manual data entry.
213:
from archives or heritage collections. Specific approaches were developed for various sources, including textual documents, such as newspaper archives, but also images, or maps.
173:(ICE) to extract data from several types documents. Advancements in automatic document processing, also called Intelligent Document Processing, improve the ability to process 248:
technologies are also involved, whether in the form of classical or three-dimensional scanning. The digitization of 3D documents can in particular resort to derivatives of
1056: 531: 78:
technologies. It is applied in many industrial and scientific fields for the optimization of administrative processes, mail processing and the digitization of analog
589: 572:, John E. Jones; William J. Jones & Frank M. Csultis, "Financial document processing system", published 2011-01-18, issued 2011-01-18 244:
These technologies often form the core of document processing. However, other algorithms may intervene before or after these processes. Indeed, document
263:
At the other end of the chain are various image completion, extrapolation or data cleanup algorithms. For textual documents, the interpretation can use
661:"Volumetric assessment of extrusion in medial meniscus posterior root tears through semi-automatic segmentation on 3-tesla magnetic resonance images" 70:
or not. The term can also include the phase of digitizing the document using a scanner and the phase of interpreting the document, for example using
1066: 237:
algorithms, which can sometimes also be used to detect the structure of the document. The resolution of the latter problem sometimes also uses
94:, such as letters and parcels, in an aim of sorting, extracting or massively extracting data. This work could be performed in-house or through 432: 364: 874:
Neural networks for semantic segmentation of historical city maps: Cross-cultural performance and the impact of figurative diversity
391: 337: 170: 153:
A technology called automatic document processing or sometimes intelligent document processing (IDP) emerged as a specific form of
381: 327: 354: 226: 90:
Document processing was initially as is still to some extent a kind of production line work dealing with the treatment of
55: 829:"New Techniques for the Digitization of Art Historical Photographic Archives - the Case of the Cini Foundation in Venice" 1061: 230: 95: 59: 264: 166: 71: 63: 996:. 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR). Niagara Falls, NY, USA: IEEE. 499: 233:(HTR), which allow the text to be transcribed automatically. Text segments as such are identified using instance or 191: 828: 39: 158: 38:, but also to make it digitally intelligible. This includes extracting the structure of the document or the 945: 660: 99: 238: 47: 150:
advanced, document processing transitioned to handling "document components ... as database entities."
757: 614: 957: 626: 257: 75: 42:
and then the content, which can take the form of text or images. The process can involve traditional
613:
Adamo, Francesco; Attivissimo, Filippo; Di Nisio, Attilio; Spadavecchia, Maurizio (February 2015).
276: 908:"Segmentation methods for character recognition: from segmentation to document structure analysis" 545: 997: 878: 680: 525: 513: 437: 413: 281: 206: 154: 123: 46:
algorithms, convolutional neural networks or manual labor. The problems addressed are related to
23: 854:
Ares Oliveira, Sofia; di Lenardo, Isabella; Tourenc, Bastien; Kaplan, Frédéric (11 July 2019).
805:
Ehrmann, Maud; Romanello, Matteo; Clematide, Simon; Ströbel, Phillip; Barman, Raphaël (2020).
787: 738: 495: 387: 360: 333: 174: 329:
Integrative Document & Content Management: Strategies for Exploiting Enterprise Knowledge
1007: 965: 919: 888: 836: 827:
Seguin, Benoit; Costiner, Lisandra; di Lenardo, Isabella; Kaplan, Frédéric (April 1, 2018).
777: 769: 728: 718: 672: 634: 505: 291: 234: 162: 146: 128: 107: 51: 296: 286: 43: 98:. Document processing can indeed involve some kind of externalized manual labor, such as 961: 630: 733: 706: 467: 249: 131:" stated that "document processing begins with the scanner". In this context, a former 114: 31: 969: 840: 638: 1050: 872: 684: 408: 118: 35: 758:"Leucocyte classification for leukaemia detection using image processing techniques" 517: 569: 245: 1011: 773: 892: 301: 253: 225:
Many technologies support the development of document processing, in particular
113:
While document processing involved data entry via keyboard well before use of a
991: 676: 907: 202: 195: 67: 855: 806: 707:"MRI Segmentation of the Human Brain: Challenges, Methods, and Applications" 79: 791: 742: 509: 990:
Ares Oliveira, Sofia; Seguin, Benoit; Kaplan, Frederic (5–8 August 2018).
723: 756:
Putzua, Lorenzo; Caocci, Giovanni; Di Rubertoa, Cecilia (November 2014).
452:
Al Young; Dayle Woolstein; Jay Johnson (February 1996). "Unknown Title".
306: 210: 91: 27: 807:"Language Resources for Historical Newspapers: the Impresso Collection" 782: 498:, Stefano Ferilli, Teresa M. A. Basile, Nicola Di Mauro (2005-04-01). 993:
dhSegment: A Generic Deep-Learning Approach for Document Segmentation
923: 705:
Despotović, Ivana; Bart, Goossens; Wilfried, Philips (1 March 2015).
615:"An automatic document processing system for medical data extraction" 256:. Furthermore, the processing of heterogeneous databases can rely on 811:
Proceedings of the 12th Language Resources and Evaluation Conference
1002: 883: 590:"Appian Adds Google Cloud Intelligence To Low-Code Automation Mix" 132: 30:
digital. Document processing does not simply aim to photograph or
659:
Changwan, Kim; Seong-Il, Lee; Won Joon, Cho (September 2020).
835:. Society for Imaging Science and Technology. pp. 1–5. 822: 820: 356:
Business Process Outsourcing: A Supply Chain of Expertises
944:
Tang, Yuan Y.; Lee, Seong-Whan; Suen, Ching Y. (1996).
665:
Orthopaedics & Traumatology: Surgery & Research
1025: 860:. Digital Humanities Conference. Utrecht, Netherlands. 433:"Paper, Once Written Off, Keeps a Place in the Office" 475:
Department of Computer Science – University of Bari
190:physicians in medical diagnosis, e.g. by analyzing 906:Fujisawa, H.; Nakano, Y.; Kurino, K. (July 1992). 711:Computational Intelligence Techniques in Medicine 857:A deep learning approach to Cadastral Computing 201:Document processing is also widely used in the 8: 833:Archiving 2018 Final Program and Proceedings 530:: CS1 maint: multiple names: authors list ( 383:Outsourcing to India: The Offshore Advantage 177:with fewer exceptions and greater speeds. 1026:"Revolutionary Scanning Technology for Art" 426: 424: 1057:Automatic identification and data capture 1001: 946:"Automatic document processing: a survey" 882: 781: 732: 722: 386:. Springer Science & Business Media. 546:"Intelligent Document Processing (IDP)" 318: 813:. Marseille, France. pp. 958–968. 523: 326:Len Asprey; Michael Middleton (2003). 380:Mark Kobayashi-Hillary (2005-12-05). 7: 409:"Immigration Contractor Trims Wages" 22:is a field of research and a set of 762:Artificial Intelligence in Medicine 431:Lawrence M. Fisher (July 7, 1990). 56:optical character recognition (OCR) 407:Julia Preston (December 2, 2007). 222:neural networks, in some sectors. 60:handwritten text recognition (HTR) 16:Digitalisation of analog documents 14: 841:10.2352/issn.2168-3204.2018.1.0.2 639:10.1016/j.measurement.2014.10.032 468:"Intelligent Document processing" 209:, in order to extract historical 171:Intelligent Character Recognition 1067:Applications of computer vision 871:Petitpierre, Rémi (July 2020). 155:Intelligent Process Automation 127:regarding what it called the " 1: 1012:10.1109/ICFHR-2018.2018.00011 970:10.1016/S0031-3203(96)00044-1 353:Vinod V. Sople (2009-05-25). 227:optical character recognition 140:Automatic document processing 774:10.1016/j.artmed.2014.09.002 231:handwritten text recognition 96:business process outsourcing 893:10.13140/RG.2.2.10973.64484 265:natural language processing 167:Natural Language Processing 72:natural language processing 1083: 677:10.1016/j.rcot.2020.06.003 82:and historical documents. 26:aimed at making an analog 570:US active US7873576B2 359:. PHI Learning Pvt. Ltd. 192:magnetic resonance images 332:. Idea Group Inc (IGI). 912:Proceedings of the IEEE 159:artificial intelligence 34:a document to obtain a 510:10.1109/ICDAR.2005.144 239:semantic segmentation 48:semantic segmentation 588:Bridgwater, Adrian. 267:(NLP) technologies. 258:image classification 121:, a 1990 article in 76:image classification 24:production processes 1062:Applied data mining 962:1996PatRe..29.1931T 950:Pattern Recognition 724:10.1155/2015/450341 631:2015Meas...61...88A 277:Document automation 62:and, more broadly, 20:Document processing 438:The New York Times 414:The New York Times 282:Document modelling 207:digital humanities 124:The New York Times 956:(12): 1931–1952. 496:Floriana Esposito 175:unstructured data 157:(IPA), combining 1074: 1041: 1040: 1038: 1036: 1022: 1016: 1015: 1005: 987: 981: 980: 978: 976: 941: 935: 934: 932: 930: 924:10.1109/5.156471 918:(7): 1079–1092. 903: 897: 896: 886: 868: 862: 861: 851: 845: 844: 824: 815: 814: 802: 796: 795: 785: 753: 747: 746: 736: 726: 702: 696: 695: 693: 691: 656: 650: 649: 647: 645: 610: 604: 603: 601: 600: 585: 579: 578: 577: 573: 566: 560: 559: 557: 556: 542: 536: 535: 529: 521: 492: 486: 485: 483: 482: 472: 464: 458: 457: 449: 443: 442: 428: 419: 418: 404: 398: 397: 377: 371: 370: 350: 344: 343: 323: 292:Document Imaging 235:object detection 163:Machine Learning 147:state of the art 129:paperless office 119:computer scanner 52:object detection 1082: 1081: 1077: 1076: 1075: 1073: 1072: 1071: 1047: 1046: 1045: 1044: 1034: 1032: 1024: 1023: 1019: 989: 988: 984: 974: 972: 943: 942: 938: 928: 926: 905: 904: 900: 870: 869: 865: 853: 852: 848: 826: 825: 818: 804: 803: 799: 755: 754: 750: 704: 703: 699: 689: 687: 658: 657: 653: 643: 641: 612: 611: 607: 598: 596: 587: 586: 582: 575: 568: 567: 563: 554: 552: 544: 543: 539: 522: 494: 493: 489: 480: 478: 470: 466: 465: 461: 454:Object Magazine 451: 450: 446: 430: 429: 422: 406: 405: 401: 394: 379: 378: 374: 367: 352: 351: 347: 340: 325: 324: 320: 315: 297:Duplex scanning 287:Data Processing 273: 219: 183: 142: 100:mechanical Turk 88: 44:computer vision 17: 12: 11: 5: 1080: 1078: 1070: 1069: 1064: 1059: 1049: 1048: 1043: 1042: 1017: 982: 936: 898: 863: 846: 816: 797: 768:(3): 179–191. 748: 697: 671:(5): 963–968. 651: 605: 580: 561: 550:keymarkinc.com 537: 487: 459: 444: 420: 399: 392: 372: 366:978-8120338159 365: 345: 338: 317: 316: 314: 311: 310: 309: 304: 299: 294: 289: 284: 279: 272: 269: 260:technologies. 250:photogrammetry 218: 215: 182: 179: 141: 138: 115:computer mouse 87: 84: 15: 13: 10: 9: 6: 4: 3: 2: 1079: 1068: 1065: 1063: 1060: 1058: 1055: 1054: 1052: 1031: 1027: 1021: 1018: 1013: 1009: 1004: 999: 995: 994: 986: 983: 971: 967: 963: 959: 955: 951: 947: 940: 937: 925: 921: 917: 913: 909: 902: 899: 894: 890: 885: 880: 876: 875: 867: 864: 859: 858: 850: 847: 842: 838: 834: 830: 823: 821: 817: 812: 808: 801: 798: 793: 789: 784: 779: 775: 771: 767: 763: 759: 752: 749: 744: 740: 735: 730: 725: 720: 716: 712: 708: 701: 698: 686: 682: 678: 674: 670: 666: 662: 655: 652: 640: 636: 632: 628: 624: 620: 616: 609: 606: 595: 591: 584: 581: 571: 565: 562: 551: 547: 541: 538: 533: 527: 519: 515: 511: 507: 503: 502: 497: 491: 488: 476: 469: 463: 460: 456:. p. 51. 455: 448: 445: 440: 439: 434: 427: 425: 421: 416: 415: 410: 403: 400: 395: 393:9783540247944 389: 385: 384: 376: 373: 368: 362: 358: 357: 349: 346: 341: 339:9781591400554 335: 331: 330: 322: 319: 312: 308: 305: 303: 300: 298: 295: 293: 290: 288: 285: 283: 280: 278: 275: 274: 270: 268: 266: 261: 259: 255: 251: 247: 242: 240: 236: 232: 228: 223: 216: 214: 212: 208: 204: 199: 197: 193: 187: 180: 178: 176: 172: 168: 164: 160: 156: 151: 149: 148: 139: 137: 134: 130: 126: 125: 120: 116: 111: 109: 103: 101: 97: 93: 85: 83: 81: 77: 73: 69: 65: 64:transcription 61: 57: 53: 49: 45: 41: 37: 36:digital image 33: 29: 25: 21: 1033:. Retrieved 1029: 1020: 992: 985: 973:. Retrieved 953: 949: 939: 927:. Retrieved 915: 911: 901: 873: 866: 856: 849: 832: 810: 800: 765: 761: 751: 714: 710: 700: 688:. Retrieved 668: 664: 654: 642:. Retrieved 622: 618: 608: 597:. Retrieved 593: 583: 564: 553:. Retrieved 549: 540: 500: 490: 479:. Retrieved 477:. 2005-04-07 474: 462: 453: 447: 436: 412: 402: 382: 375: 355: 348: 328: 321: 262: 246:digitization 243: 241:algorithms. 224: 220: 217:Technologies 200: 188: 184: 181:Applications 152: 145: 143: 122: 112: 104: 89: 19: 18: 783:11584/94592 717:: 963–968. 619:Measurement 302:Text mining 254:file format 229:(OCR), and 196:microscopic 1051:Categories 1035:3 February 1003:1804.10371 975:3 February 929:3 February 884:2101.12478 690:31 January 644:31 January 599:2021-04-21 555:2024-07-12 481:2018-09-08 313:References 203:humanities 108:data entry 86:Background 66:, whether 685:225215597 625:: 88–99. 526:cite book 169:(NLP) or 92:documents 74:(NLP) or 68:automatic 792:25241903 743:25945121 518:17302169 307:Workflow 271:See also 211:big data 198:images. 161:such as 80:archives 28:document 958:Bibcode 877:(MSc). 734:4402572 627:Bibcode 144:As the 1030:Artmyn 790:  741:  731:  683:  594:Forbes 576:  516:  390:  363:  336:  165:(ML), 40:layout 998:arXiv 879:arXiv 681:S2CID 514:S2CID 471:(PDF) 194:, or 133:Xerox 117:or a 1037:2021 977:2021 931:2021 788:PMID 739:PMID 715:2015 692:2021 646:2021 532:link 388:ISBN 361:ISBN 334:ISBN 205:and 32:scan 1008:doi 966:doi 920:doi 889:doi 837:doi 778:hdl 770:doi 729:PMC 719:doi 673:doi 669:101 635:doi 506:doi 110:." 1053:: 1028:. 1006:. 964:. 954:29 952:. 948:. 916:80 914:. 910:. 887:. 831:. 819:^ 809:. 786:. 776:. 766:63 764:. 760:. 737:. 727:. 713:. 709:. 679:. 667:. 663:. 633:. 623:61 621:. 617:. 592:. 548:. 528:}} 524:{{ 512:. 504:. 473:. 435:. 423:^ 411:. 102:. 58:, 54:, 50:, 1039:. 1014:. 1010:: 1000:: 979:. 968:: 960:: 933:. 922:: 895:. 891:: 881:: 843:. 839:: 794:. 780:: 772:: 745:. 721:: 694:. 675:: 648:. 637:: 629:: 602:. 558:. 534:) 520:. 508:: 484:. 441:. 417:. 396:. 369:. 342:.

Index

production processes
document
scan
digital image
layout
computer vision
semantic segmentation
object detection
optical character recognition (OCR)
handwritten text recognition (HTR)
transcription
automatic
natural language processing
image classification
archives
documents
business process outsourcing
mechanical Turk
data entry
computer mouse
computer scanner
The New York Times
paperless office
Xerox
state of the art
Intelligent Process Automation
artificial intelligence
Machine Learning
Natural Language Processing
Intelligent Character Recognition

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.