33:
289:, most Indic languages combine 2 or more basic characters to form compound characters. The shape of a compound character is more complex than the constituent basic characters. Some Indo-Aryan languages (including Hindi and Punjabi) have a horizontal line over the characters, while other languages (including
170:
for Latin characters is still not 100% accurate but a relatively high degree of accuracy in conversion has been able to be achieved. Such accuracy has not yet been able to be achieved for Indic scripts using OCR. This is due in part to the writing systems of
241:
are the most widely spoken Indo-Aryan languages and are also the fourth, seventh and tenth most widely spoken languages in the world respectively. Two or more languages can be written with same script. For example,
187:
629:
543:
655:
569:
364:
SanskritOCR - OCR software for
Sanskrit, Hindi and other Indo-Aryan languages based on the Devanagari script. Sanskrit OCR is developed by a Sanskrit scholar from
179:
183:
226:
803:
594:
508:
172:
376:. The official website is in German. The interface of earlier versions of the software was also in German, but later versions have an
116:
54:
167:
141:
390:
97:
69:
373:
50:
43:
617:
An OCR (Optical
Character Recognition) for Sanskrit has created an offline corpus that includes over 3,000 books.
531:
An OCR (Optical
Character Recognition) for Sanskrit has created an offline corpus that includes over 3,000 books.
76:
808:
448:
697:"The Magic of OCR & Augmented Reality Translates text in Indian Languages, Real Time – Without Internet"
175:
as well as a lack of standard representation, encoding, and support among operating systems and keyboards.
83:
768:
65:
416:
329:
309:) do not. These are some of the main challenges for creating a single OCR for all Indic languages.
255:
157:
294:
749:
602:
516:
489:
348:
290:
271:
741:
481:
377:
298:
267:
251:
238:
234:
203:
344:
325:
306:
275:
207:
145:
732:
Pal, U.; Chaudhuri, B.B. (2004-09-01). "Indian script character recognition: a survey".
472:
Pal, U.; Chaudhuri, B.B. (2004-09-01). "Indian script character recognition: a survey".
302:
263:
214:
153:
133:
90:
17:
677:
312:
Indic OCR also generally includes support for recently invented scripts in India like
797:
402:
199:
383:
321:
745:
485:
776:
630:"Digitisation going on at brisk pace: Vice-Chancellor Prof V Muralidhara Sharma"
544:"Digitisation going on at brisk pace: Vice-Chancellor Prof V Muralidhara Sharma"
336:
317:
32:
194:
have carried out many projects relating to OCR. Their projects include OCR for
696:
428:
243:
211:
149:
753:
606:
520:
493:
282:
195:
714:
313:
259:
715:"Indian Language Technology Proliferation and Deployment Centre - Home"
365:
161:
352:
137:
144:(OCR) techniques. Broadly, it can also refer to the OCR systems of
406:
398:
394:
286:
247:
230:
191:
132:
refers to the process of converting text images written in Indic
340:
769:"SanskritOCR - Optical Text Recognition for Sanskrit Documents"
412:
26:
355:, all other Indic languages are written from left to right.
386:- Optical character recognition engine for Indian languages
372:
of
Department for Languages and Cultures of Southern Asia,
656:"Who Says Sanskrit Is Dead? It's Rocking the Wiki World"
570:"Who Says Sanskrit Is Dead? It's Rocking the Wiki World"
57:. Unsourced material may be challenged and removed.
188:Ministry of Electronics and Information Technology
678:"Multilingual Computing & Heritage Computing"
389:Chitrankan - This technology was developed by
8:
324:, etc. which are mainly created for writing
180:Centre for Development of Advanced Computing
449:"The 10 Most Spoken Languages In The World"
184:Technology Development for Indian Languages
186:, the premier R&D organisation of the
117:Learn how and when to remove this message
439:
339:is absent in Indic scripts. Apart from
777:"C-DAC: GIST - Products - Chitrankan"
7:
221:Properties of Indian writing systems
55:adding citations to reliable sources
595:"Pazhur Patasala — a revival story"
509:"Pazhur Patasala — a revival story"
25:
31:
636:. Hans News Service. 2019-03-20
550:. Hans News Service. 2019-03-20
281:Apart from basic characters as
227:officially recognised languages
42:needs additional citations for
654:Dikshit, Ashish (2016-10-27).
568:Dikshit, Ashish (2016-10-27).
393:, Kolkata, and transferred to
160:, which are all written in an
156:, not just the scripts of the
1:
804:Optical character recognition
262:, Bhojpuri and others, while
142:Optical character recognition
746:10.1016/j.patcog.2004.02.003
695:Singh, Rustam (2016-04-16).
486:10.1016/j.patcog.2004.02.003
293:) and Dravidian languages (
825:
593:Prabhu, S. (2020-06-04).
507:Prabhu, S. (2020-06-04).
190:(also known as MeitY) of
374:Freie Universität Berlin
397:. It processes printed
164:-based writing system.
18:OCR in Indian Languages
427:OCR has been used for
634:www.thehansindia.com
548:www.thehansindia.com
431:and other projects.
417:Tesseract (software)
330:Austroasiatic family
229:in India. Of these,
51:improve this article
734:Pattern Recognition
474:Pattern Recognition
453:The Babbel Magazine
447:GmbH, Lesson Nine.
158:Indian subcontinent
370:Dr. Oliver Hellwig
266:is used to write
246:is used to write
148:for languages of
127:
126:
119:
101:
16:(Redirected from
816:
790:
788:
787:
772:
757:
740:(9): 1887–1899.
728:
726:
725:
710:
708:
707:
691:
689:
688:
670:
669:
667:
666:
651:
645:
644:
642:
641:
626:
620:
619:
614:
613:
590:
584:
583:
581:
580:
565:
559:
558:
556:
555:
540:
534:
533:
528:
527:
504:
498:
497:
480:(9): 1887–1899.
469:
463:
462:
460:
459:
444:
413:Indic OCR models
337:upper/lower case
122:
115:
111:
108:
102:
100:
59:
35:
27:
21:
824:
823:
819:
818:
817:
815:
814:
813:
809:Indic computing
794:
793:
785:
783:
775:
767:
764:
731:
723:
721:
713:
705:
703:
694:
686:
684:
676:
673:
664:
662:
653:
652:
648:
639:
637:
628:
627:
623:
611:
609:
592:
591:
587:
578:
576:
567:
566:
562:
553:
551:
542:
541:
537:
525:
523:
506:
505:
501:
471:
470:
466:
457:
455:
446:
445:
441:
437:
425:
361:
335:The concept of
326:Munda languages
223:
173:Indic languages
146:Brahmic scripts
123:
112:
106:
103:
60:
58:
48:
36:
23:
22:
15:
12:
11:
5:
822:
820:
812:
811:
806:
796:
795:
792:
791:
773:
763:
762:External links
760:
759:
758:
729:
719:www.tdil-dc.in
711:
692:
672:
671:
646:
621:
585:
560:
535:
499:
464:
438:
436:
433:
424:
421:
420:
419:
410:
387:
381:
380:interface too.
360:
357:
264:Eastern Nagari
222:
219:
154:Southeast Asia
125:
124:
39:
37:
30:
24:
14:
13:
10:
9:
6:
4:
3:
2:
821:
810:
807:
805:
802:
801:
799:
782:
778:
774:
770:
766:
765:
761:
755:
751:
747:
743:
739:
735:
730:
720:
716:
712:
702:
698:
693:
683:
679:
675:
674:
661:
657:
650:
647:
635:
631:
625:
622:
618:
608:
604:
600:
596:
589:
586:
575:
571:
564:
561:
549:
545:
539:
536:
532:
522:
518:
514:
510:
503:
500:
495:
491:
487:
483:
479:
475:
468:
465:
454:
450:
443:
440:
434:
432:
430:
422:
418:
414:
411:
408:
404:
400:
396:
392:
388:
385:
382:
379:
375:
371:
367:
363:
362:
358:
356:
354:
350:
346:
342:
338:
333:
331:
327:
323:
319:
315:
310:
308:
304:
300:
296:
292:
288:
284:
279:
277:
273:
269:
265:
261:
257:
253:
249:
245:
240:
236:
232:
228:
225:There are 22
220:
218:
216:
213:
209:
205:
201:
197:
193:
189:
185:
181:
176:
174:
169:
165:
163:
159:
155:
151:
147:
143:
139:
135:
131:
121:
118:
110:
107:February 2022
99:
96:
92:
89:
85:
82:
78:
75:
71:
68: –
67:
63:
62:Find sources:
56:
52:
46:
45:
40:This article
38:
34:
29:
28:
19:
784:. Retrieved
780:
737:
733:
722:. Retrieved
718:
704:. Retrieved
701:Entrepreneur
700:
685:. Retrieved
681:
663:. Retrieved
659:
649:
638:. Retrieved
633:
624:
616:
610:. Retrieved
598:
588:
577:. Retrieved
573:
563:
552:. Retrieved
547:
538:
530:
524:. Retrieved
512:
502:
477:
473:
467:
456:. Retrieved
452:
442:
426:
401:text from a
384:E-aksharayan
369:
334:
322:Mundari Bani
311:
280:
278:and others.
224:
182:(C-DAC) and
177:
166:
129:
128:
113:
104:
94:
87:
80:
73:
61:
49:Please help
44:verification
41:
682:www.cdac.in
405:or from an
318:Warang Citi
66:"Indic OCR"
798:Categories
786:2017-02-12
724:2017-02-12
706:2017-02-12
687:2017-02-12
665:2021-09-01
640:2021-09-01
612:2021-09-01
579:2021-09-01
554:2021-09-01
526:2021-09-01
458:2018-03-20
435:References
429:Wikisource
423:OCR in use
283:consonants
256:Rajasthani
244:Devanagari
212:Devanagari
150:South Asia
77:newspapers
754:0031-3203
607:0971-751X
599:The Hindu
521:0971-751X
513:The Hindu
494:0031-3203
295:Malayalam
196:Malayalam
130:Indic OCR
660:TheQuint
574:TheQuint
359:Examples
349:Kashmiri
314:Ol Chiki
291:Gujarati
276:Manipuri
272:Assamese
260:Sanskrit
781:cdac.in
403:scanner
378:English
366:Germany
299:Kannada
268:Bengali
252:Marathi
239:Punjabi
235:Bengali
204:Punjabi
162:abugida
134:scripts
91:scholar
752:
605:
519:
492:
353:Thaana
345:Sindhi
307:Telugu
305:, and
287:vowels
215:script
208:Telugu
140:using
138:e-text
93:
86:
79:
72:
64:
407:image
399:Hindi
395:C-DAC
303:Tamil
248:Hindi
231:Hindi
192:India
136:into
98:JSTOR
84:books
750:ISSN
603:ISSN
517:ISSN
490:ISSN
415:for
351:and
341:Urdu
285:and
237:and
210:and
200:Odia
178:The
152:and
70:news
742:doi
482:doi
391:ISI
328:of
168:OCR
53:by
800::
779:.
748:.
738:37
736:.
717:.
699:.
680:.
658:.
632:.
615:.
601:.
597:.
572:.
546:.
529:.
515:.
511:.
488:.
478:37
476:.
451:.
368:-
347:,
343:,
332:.
320:,
316:,
301:,
297:,
274:,
270:,
258:,
254:,
250:,
233:,
217:.
206:,
202:,
198:,
789:.
771:.
756:.
744::
727:.
709:.
690:.
668:.
643:.
582:.
557:.
496:.
484::
461:.
409:.
120:)
114:(
109:)
105:(
95:·
88:·
81:·
74:·
47:.
20:)
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.