98:: the ICNC is also involved in the collection of language data for specific research purposes, including DIALEKT (dialectal speech), CzeSL (texts written by non-native learners of Czech), DEAF (Czech texts written by the deaf), or Jerome (translated and non-translated Czech).
92:: the DIAKORP corpus of historical Czech includes texts from 14th century onwards. The current focus of DIAKORP is on the 19th century. The long term goal of DIAKORP is to create a corpus covering the period of 1850–present and interconnecting the data with the SYN series.
86:: InterCorp is a large corpus of Czech texts aligned at the sentence level with translations to or from more than 30 languages. The core of the corpus consists of manually aligned and proofread fiction texts.
43:. The ICNC collaborates with over 200 researchers and students (mainly for spoken and parallel data acquisition), 270 publishers (as text providers), and other similar research projects.
213:
147:
242:
198:
477:
116:
362:
397:
422:
565:
462:
335:
545:
355:
437:
76:: The ORAL-series corpora contain contemporary, spontaneous spoken language used in informal situations through the entire
124:
657:
442:
348:
62:
of the 20th and 21st century (esp. the last twenty years) and forms the core of the project. Texts are enriched with
600:
585:
570:
540:
515:
510:
417:
387:
647:
616:
560:
530:
402:
175:
M. Hnátková, M. Křen, P. Procházka, and H. Skoumalová. (2014). "The SYN-series corpora of written Czech".
590:
555:
550:
457:
447:
236:
192:
595:
432:
298:
371:
180:
40:
32:
652:
535:
495:
272:
31:, developed by the Institute of the Czech National Corpus (ICNC) in the Faculty of Arts at
525:
392:
80:(as opposed to prepared, broadcast or scripted texts generally found in spoken corpora).
412:
77:
59:
28:
257:
641:
626:
67:
427:
407:
184:
24:
276:
575:
505:
452:
63:
51:
The Czech
National Corpus focuses systematically on the following areas:
621:
580:
500:
472:
340:
36:
467:
23:(CNC) (Czech : Český národnà korpus) is a large electronic
344:
16:
Set of written texts in electronic form in the Czech language
299:"Corpus of 19th century Czech texts: Problems and solutions"
330:
258:"The case of InterCorp, a multilingual parallel corpus"
155:
Publication Server of the
Institute for German Language
214:"Balanced data repository of spontaneous spoken Czech"
39:. The collection is used for teaching and research in
609:
486:
378:
148:"Recent Developments in the Czech National Corpus"
212:L. Válková, M. Waclawičová, and M. Křen. (2012).
478:Wellington Corpus of Spoken New Zealand English
506:CorCenCC National Corpus of Contemporary Welsh
356:
8:
241:: CS1 maint: multiple names: authors list (
197:: CS1 maint: multiple names: authors list (
265:International Journal of Corpus Linguistics
363:
349:
341:
398:Bergen Corpus of London Teenage Language
117:"Institute of the Czech National Corpus"
423:Corpus of Contemporary American English
108:
336:Institute of the Czech National Corpus
234:
190:
121:Institute of the Czech National Corpus
74:Contemporary spontaneous spoken Czech
7:
566:Scottish Corpus of Texts and Speech
463:Switchboard Telephone Speech Corpus
58:: the SYN-series corpora maps the
14:
297:K. KuÄŤera and M. Stluka. (2014).
546:Neo-Assyrian Text Corpus Project
438:International Corpus of English
256:F. Čermák and A. Rosen (2012).
1:
443:Lancaster-Oslo-Bergen Corpus
84:Multilingual parallel corpus
70:, and morphological tagging.
96:Specialised linguistic data
674:
90:Diachronic corpus of Czech
56:Synchronic written corpora
601:Thesaurus Linguae Graecae
586:Tehran Monolingual Corpus
571:Slovenian National Corpus
541:National Corpus of Polish
516:Croatian National Corpus
511:Croatian Language Corpus
418:Cambridge English Corpus
388:American National Corpus
561:Russian National Corpus
531:German Reference Corpus
403:British National Corpus
306:Proceedings of LREC2014
277:10.1075/ijcl.17.3.05cer
221:Proceedings of LREC2012
177:Proceedings of LREC2014
27:of written and spoken
591:Tekstaro de Esperanto
556:Quranic Arabic Corpus
551:Persian Speech Corpus
521:Czech National Corpus
458:Spoken English Corpus
448:Oxford English Corpus
21:Czech National Corpus
596:TenTen Corpus Family
331:Český národnà korpus
658:Linguistic research
372:Corpus linguistics
41:corpus linguistics
33:Charles University
635:
634:
127:on 9 January 2019
665:
536:Hamshahri Corpus
496:Bijankhan Corpus
365:
358:
351:
342:
318:
317:
315:
313:
303:
294:
288:
287:
285:
283:
262:
253:
247:
246:
240:
232:
230:
228:
218:
209:
203:
202:
196:
188:
172:
166:
165:
163:
161:
152:
143:
137:
136:
134:
132:
123:. Archived from
113:
673:
672:
668:
667:
666:
664:
663:
662:
638:
637:
636:
631:
605:
526:Europarl Corpus
488:
482:
393:Bank of English
380:
374:
369:
327:
322:
321:
311:
309:
301:
296:
295:
291:
281:
279:
260:
255:
254:
250:
233:
226:
224:
216:
211:
210:
206:
189:
174:
173:
169:
159:
157:
150:
145:
144:
140:
130:
128:
115:
114:
110:
105:
49:
17:
12:
11:
5:
671:
669:
661:
660:
655:
650:
648:Czech language
640:
639:
633:
632:
630:
629:
624:
619:
617:BNC consortium
613:
611:
607:
606:
604:
603:
598:
593:
588:
583:
578:
573:
568:
563:
558:
553:
548:
543:
538:
533:
528:
523:
518:
513:
508:
503:
498:
492:
490:
484:
483:
481:
480:
475:
470:
465:
460:
455:
450:
445:
440:
435:
430:
425:
420:
415:
413:Buckeye Corpus
410:
405:
400:
395:
390:
384:
382:
376:
375:
370:
368:
367:
360:
353:
345:
339:
338:
333:
326:
325:External links
323:
320:
319:
289:
271:(3): 411–427.
248:
204:
167:
146:Křen, Michal.
138:
107:
106:
104:
101:
100:
99:
93:
87:
81:
78:Czech Republic
71:
60:Czech language
48:
47:Areas of focus
45:
29:Czech language
15:
13:
10:
9:
6:
4:
3:
2:
670:
659:
656:
654:
651:
649:
646:
645:
643:
628:
627:Sketch Engine
625:
623:
620:
618:
615:
614:
612:
610:Organizations
608:
602:
599:
597:
594:
592:
589:
587:
584:
582:
579:
577:
574:
572:
569:
567:
564:
562:
559:
557:
554:
552:
549:
547:
544:
542:
539:
537:
534:
532:
529:
527:
524:
522:
519:
517:
514:
512:
509:
507:
504:
502:
499:
497:
494:
493:
491:
487:Text corpora,
485:
479:
476:
474:
471:
469:
466:
464:
461:
459:
456:
454:
451:
449:
446:
444:
441:
439:
436:
434:
431:
429:
426:
424:
421:
419:
416:
414:
411:
409:
406:
404:
401:
399:
396:
394:
391:
389:
386:
385:
383:
379:Text corpora,
377:
373:
366:
361:
359:
354:
352:
347:
346:
343:
337:
334:
332:
329:
328:
324:
307:
300:
293:
290:
278:
274:
270:
266:
259:
252:
249:
244:
238:
222:
215:
208:
205:
200:
194:
186:
182:
178:
171:
168:
156:
149:
142:
139:
126:
122:
118:
112:
109:
102:
97:
94:
91:
88:
85:
82:
79:
75:
72:
69:
68:lemmatization
65:
61:
57:
54:
53:
52:
46:
44:
42:
38:
34:
30:
26:
22:
520:
428:Enron Corpus
408:Brown Corpus
310:. Retrieved
305:
292:
280:. Retrieved
268:
264:
251:
237:cite journal
225:. Retrieved
220:
207:
193:cite journal
176:
170:
158:. Retrieved
154:
141:
129:. Retrieved
125:the original
120:
111:
95:
89:
83:
73:
55:
50:
20:
18:
489:non-English
223:: 3345–3349
179:: 160–164.
642:Categories
103:References
312:9 January
308:: 165–168
282:9 January
227:9 January
160:8 January
131:8 January
576:TalkBank
453:PropBank
433:EnTenTen
64:metadata
653:Corpora
622:COBUILD
581:Tatoeba
501:CHILDES
473:VerbNet
381:English
185:2586912
183:
37:Prague
25:corpus
468:TIMIT
302:(PDF)
261:(PDF)
217:(PDF)
181:S2CID
151:(PDF)
314:2019
284:2019
243:link
229:2019
199:link
162:2019
133:2019
19:The
273:doi
35:in
644::
304:.
269:13
267:.
263:.
239:}}
235:{{
219:.
195:}}
191:{{
153:.
119:.
66:,
364:e
357:t
350:v
316:.
286:.
275::
245:)
231:.
201:)
187:.
164:.
135:.
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.