Knowledge (XXG)

Indic OCR

Source 📝

33: 289:, most Indic languages combine 2 or more basic characters to form compound characters. The shape of a compound character is more complex than the constituent basic characters. Some Indo-Aryan languages (including Hindi and Punjabi) have a horizontal line over the characters, while other languages (including 170:
for Latin characters is still not 100% accurate but a relatively high degree of accuracy in conversion has been able to be achieved. Such accuracy has not yet been able to be achieved for Indic scripts using OCR. This is due in part to the writing systems of
241:
are the most widely spoken Indo-Aryan languages and are also the fourth, seventh and tenth most widely spoken languages in the world respectively. Two or more languages can be written with same script. For example,
187: 629: 543: 655: 569: 364:
SanskritOCR - OCR software for Sanskrit, Hindi and other Indo-Aryan languages based on the Devanagari script. Sanskrit OCR is developed by a Sanskrit scholar from
179: 183: 226: 803: 594: 508: 172: 376:. The official website is in German. The interface of earlier versions of the software was also in German, but later versions have an 116: 54: 167: 141: 390: 97: 69: 373: 50: 43: 617:
An OCR (Optical Character Recognition) for Sanskrit has created an offline corpus that includes over 3,000 books.
531:
An OCR (Optical Character Recognition) for Sanskrit has created an offline corpus that includes over 3,000 books.
76: 808: 448: 697:"The Magic of OCR & Augmented Reality Translates text in Indian Languages, Real Time – Without Internet" 175:
as well as a lack of standard representation, encoding, and support among operating systems and keyboards.
83: 768: 65: 416: 329: 309:) do not. These are some of the main challenges for creating a single OCR for all Indic languages. 255: 157: 294: 749: 602: 516: 489: 348: 290: 271: 741: 481: 377: 298: 267: 251: 238: 234: 203: 344: 325: 306: 275: 207: 145: 732:
Pal, U.; Chaudhuri, B.B. (2004-09-01). "Indian script character recognition: a survey".
472:
Pal, U.; Chaudhuri, B.B. (2004-09-01). "Indian script character recognition: a survey".
302: 263: 214: 153: 133: 90: 17: 677: 312:
Indic OCR also generally includes support for recently invented scripts in India like
797: 402: 199: 383: 321: 745: 485: 776: 630:"Digitisation going on at brisk pace: Vice-Chancellor Prof V Muralidhara Sharma" 544:"Digitisation going on at brisk pace: Vice-Chancellor Prof V Muralidhara Sharma" 336: 317: 32: 194:
have carried out many projects relating to OCR. Their projects include OCR for
696: 428: 243: 211: 149: 753: 606: 520: 493: 282: 195: 714: 313: 259: 715:"Indian Language Technology Proliferation and Deployment Centre - Home" 365: 161: 352: 137: 144:(OCR) techniques. Broadly, it can also refer to the OCR systems of 406: 398: 394: 286: 247: 230: 191: 132:
refers to the process of converting text images written in Indic
340: 769:"SanskritOCR - Optical Text Recognition for Sanskrit Documents" 412: 26: 355:, all other Indic languages are written from left to right. 386:- Optical character recognition engine for Indian languages 372:
of Department for Languages and Cultures of Southern Asia,
656:"Who Says Sanskrit Is Dead? It's Rocking the Wiki World" 570:"Who Says Sanskrit Is Dead? It's Rocking the Wiki World" 57:. Unsourced material may be challenged and removed. 188:Ministry of Electronics and Information Technology 678:"Multilingual Computing & Heritage Computing" 389:Chitrankan - This technology was developed by 8: 324:, etc. which are mainly created for writing 180:Centre for Development of Advanced Computing 449:"The 10 Most Spoken Languages In The World" 184:Technology Development for Indian Languages 186:, the premier R&D organisation of the 117:Learn how and when to remove this message 439: 339:is absent in Indic scripts. Apart from 777:"C-DAC: GIST - Products - Chitrankan" 7: 221:Properties of Indian writing systems 55:adding citations to reliable sources 595:"Pazhur Patasala — a revival story" 509:"Pazhur Patasala — a revival story" 25: 31: 636:. Hans News Service. 2019-03-20 550:. Hans News Service. 2019-03-20 281:Apart from basic characters as 227:officially recognised languages 42:needs additional citations for 654:Dikshit, Ashish (2016-10-27). 568:Dikshit, Ashish (2016-10-27). 393:, Kolkata, and transferred to 160:, which are all written in an 156:, not just the scripts of the 1: 804:Optical character recognition 262:, Bhojpuri and others, while 142:Optical character recognition 746:10.1016/j.patcog.2004.02.003 695:Singh, Rustam (2016-04-16). 486:10.1016/j.patcog.2004.02.003 293:) and Dravidian languages ( 825: 593:Prabhu, S. (2020-06-04). 507:Prabhu, S. (2020-06-04). 190:(also known as MeitY) of 374:Freie Universität Berlin 397:. It processes printed 164:-based writing system. 18:OCR in Indian Languages 427:OCR has been used for 634:www.thehansindia.com 548:www.thehansindia.com 431:and other projects. 417:Tesseract (software) 330:Austroasiatic family 229:in India. Of these, 51:improve this article 734:Pattern Recognition 474:Pattern Recognition 453:The Babbel Magazine 447:GmbH, Lesson Nine. 158:Indian subcontinent 370:Dr. Oliver Hellwig 266:is used to write 246:is used to write 148:for languages of 127: 126: 119: 101: 16:(Redirected from 816: 790: 788: 787: 772: 757: 740:(9): 1887–1899. 728: 726: 725: 710: 708: 707: 691: 689: 688: 670: 669: 667: 666: 651: 645: 644: 642: 641: 626: 620: 619: 614: 613: 590: 584: 583: 581: 580: 565: 559: 558: 556: 555: 540: 534: 533: 528: 527: 504: 498: 497: 480:(9): 1887–1899. 469: 463: 462: 460: 459: 444: 413:Indic OCR models 337:upper/lower case 122: 115: 111: 108: 102: 100: 59: 35: 27: 21: 824: 823: 819: 818: 817: 815: 814: 813: 809:Indic computing 794: 793: 785: 783: 775: 767: 764: 731: 723: 721: 713: 705: 703: 694: 686: 684: 676: 673: 664: 662: 653: 652: 648: 639: 637: 628: 627: 623: 611: 609: 592: 591: 587: 578: 576: 567: 566: 562: 553: 551: 542: 541: 537: 525: 523: 506: 505: 501: 471: 470: 466: 457: 455: 446: 445: 441: 437: 425: 361: 335:The concept of 326:Munda languages 223: 173:Indic languages 146:Brahmic scripts 123: 112: 106: 103: 60: 58: 48: 36: 23: 22: 15: 12: 11: 5: 822: 820: 812: 811: 806: 796: 795: 792: 791: 773: 763: 762:External links 760: 759: 758: 729: 719:www.tdil-dc.in 711: 692: 672: 671: 646: 621: 585: 560: 535: 499: 464: 438: 436: 433: 424: 421: 420: 419: 410: 387: 381: 380:interface too. 360: 357: 264:Eastern Nagari 222: 219: 154:Southeast Asia 125: 124: 39: 37: 30: 24: 14: 13: 10: 9: 6: 4: 3: 2: 821: 810: 807: 805: 802: 801: 799: 782: 778: 774: 770: 766: 765: 761: 755: 751: 747: 743: 739: 735: 730: 720: 716: 712: 702: 698: 693: 683: 679: 675: 674: 661: 657: 650: 647: 635: 631: 625: 622: 618: 608: 604: 600: 596: 589: 586: 575: 571: 564: 561: 549: 545: 539: 536: 532: 522: 518: 514: 510: 503: 500: 495: 491: 487: 483: 479: 475: 468: 465: 454: 450: 443: 440: 434: 432: 430: 422: 418: 414: 411: 408: 404: 400: 396: 392: 388: 385: 382: 379: 375: 371: 367: 363: 362: 358: 356: 354: 350: 346: 342: 338: 333: 331: 327: 323: 319: 315: 310: 308: 304: 300: 296: 292: 288: 284: 279: 277: 273: 269: 265: 261: 257: 253: 249: 245: 240: 236: 232: 228: 225:There are 22 220: 218: 216: 213: 209: 205: 201: 197: 193: 189: 185: 181: 176: 174: 169: 165: 163: 159: 155: 151: 147: 143: 139: 135: 131: 121: 118: 110: 107:February 2022 99: 96: 92: 89: 85: 82: 78: 75: 71: 68: –  67: 63: 62:Find sources: 56: 52: 46: 45: 40:This article 38: 34: 29: 28: 19: 784:. Retrieved 780: 737: 733: 722:. Retrieved 718: 704:. Retrieved 701:Entrepreneur 700: 685:. Retrieved 681: 663:. Retrieved 659: 649: 638:. Retrieved 633: 624: 616: 610:. Retrieved 598: 588: 577:. Retrieved 573: 563: 552:. Retrieved 547: 538: 530: 524:. Retrieved 512: 502: 477: 473: 467: 456:. Retrieved 452: 442: 426: 401:text from a 384:E-aksharayan 369: 334: 322:Mundari Bani 311: 280: 278:and others. 224: 182:(C-DAC) and 177: 166: 129: 128: 113: 104: 94: 87: 80: 73: 61: 49:Please help 44:verification 41: 682:www.cdac.in 405:or from an 318:Warang Citi 66:"Indic OCR" 798:Categories 786:2017-02-12 724:2017-02-12 706:2017-02-12 687:2017-02-12 665:2021-09-01 640:2021-09-01 612:2021-09-01 579:2021-09-01 554:2021-09-01 526:2021-09-01 458:2018-03-20 435:References 429:Wikisource 423:OCR in use 283:consonants 256:Rajasthani 244:Devanagari 212:Devanagari 150:South Asia 77:newspapers 754:0031-3203 607:0971-751X 599:The Hindu 521:0971-751X 513:The Hindu 494:0031-3203 295:Malayalam 196:Malayalam 130:Indic OCR 660:TheQuint 574:TheQuint 359:Examples 349:Kashmiri 314:Ol Chiki 291:Gujarati 276:Manipuri 272:Assamese 260:Sanskrit 781:cdac.in 403:scanner 378:English 366:Germany 299:Kannada 268:Bengali 252:Marathi 239:Punjabi 235:Bengali 204:Punjabi 162:abugida 134:scripts 91:scholar 752:  605:  519:  492:  353:Thaana 345:Sindhi 307:Telugu 305:, and 287:vowels 215:script 208:Telugu 140:using 138:e-text 93:  86:  79:  72:  64:  407:image 399:Hindi 395:C-DAC 303:Tamil 248:Hindi 231:Hindi 192:India 136:into 98:JSTOR 84:books 750:ISSN 603:ISSN 517:ISSN 490:ISSN 415:for 351:and 341:Urdu 285:and 237:and 210:and 200:Odia 178:The 152:and 70:news 742:doi 482:doi 391:ISI 328:of 168:OCR 53:by 800:: 779:. 748:. 738:37 736:. 717:. 699:. 680:. 658:. 632:. 615:. 601:. 597:. 572:. 546:. 529:. 515:. 511:. 488:. 478:37 476:. 451:. 368:- 347:, 343:, 332:. 320:, 316:, 301:, 297:, 274:, 270:, 258:, 254:, 250:, 233:, 217:. 206:, 202:, 198:, 789:. 771:. 756:. 744:: 727:. 709:. 690:. 668:. 643:. 582:. 557:. 496:. 484:: 461:. 409:. 120:) 114:( 109:) 105:( 95:· 88:· 81:· 74:· 47:. 20:)

Index

OCR in Indian Languages

verification
improve this article
adding citations to reliable sources
"Indic OCR"
news
newspapers
books
scholar
JSTOR
Learn how and when to remove this message
scripts
e-text
Optical character recognition
Brahmic scripts
South Asia
Southeast Asia
Indian subcontinent
abugida
OCR
Indic languages
Centre for Development of Advanced Computing
Technology Development for Indian Languages
Ministry of Electronics and Information Technology
India
Malayalam
Odia
Punjabi
Telugu

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.