Knowledge

Czech National Corpus

Source đź“ť

98:: the ICNC is also involved in the collection of language data for specific research purposes, including DIALEKT (dialectal speech), CzeSL (texts written by non-native learners of Czech), DEAF (Czech texts written by the deaf), or Jerome (translated and non-translated Czech). 92:: the DIAKORP corpus of historical Czech includes texts from 14th century onwards. The current focus of DIAKORP is on the 19th century. The long term goal of DIAKORP is to create a corpus covering the period of 1850–present and interconnecting the data with the SYN series. 86:: InterCorp is a large corpus of Czech texts aligned at the sentence level with translations to or from more than 30 languages. The core of the corpus consists of manually aligned and proofread fiction texts. 43:. The ICNC collaborates with over 200 researchers and students (mainly for spoken and parallel data acquisition), 270 publishers (as text providers), and other similar research projects. 213: 147: 242: 198: 477: 116: 362: 397: 422: 565: 462: 335: 545: 355: 437: 76:: The ORAL-series corpora contain contemporary, spontaneous spoken language used in informal situations through the entire 124: 657: 442: 348: 62:
of the 20th and 21st century (esp. the last twenty years) and forms the core of the project. Texts are enriched with
600: 585: 570: 540: 515: 510: 417: 387: 647: 616: 560: 530: 402: 175:
M. Hnátková, M. Křen, P. Procházka, and H. Skoumalová. (2014). "The SYN-series corpora of written Czech".
590: 555: 550: 457: 447: 236: 192: 595: 432: 298: 371: 180: 40: 32: 652: 535: 495: 272: 31:, developed by the Institute of the Czech National Corpus (ICNC) in the Faculty of Arts at 525: 392: 80:(as opposed to prepared, broadcast or scripted texts generally found in spoken corpora). 412: 77: 59: 28: 257: 641: 626: 67: 427: 407: 184: 24: 276: 575: 505: 452: 63: 51:
The Czech National Corpus focuses systematically on the following areas:
621: 580: 500: 472: 340: 36: 467: 23:(CNC) (Czech : ÄŚeskĂ˝ národnĂ­ korpus) is a large electronic 344: 16:
Set of written texts in electronic form in the Czech language
299:"Corpus of 19th century Czech texts: Problems and solutions" 330: 258:"The case of InterCorp, a multilingual parallel corpus" 155:
Publication Server of the Institute for German Language
214:"Balanced data repository of spontaneous spoken Czech" 39:. The collection is used for teaching and research in 609: 486: 378: 148:"Recent Developments in the Czech National Corpus" 212:L. Válková, M. WaclawiÄŤová, and M. KĹ™en. (2012). 478:Wellington Corpus of Spoken New Zealand English 506:CorCenCC National Corpus of Contemporary Welsh 356: 8: 241:: CS1 maint: multiple names: authors list ( 197:: CS1 maint: multiple names: authors list ( 265:International Journal of Corpus Linguistics 363: 349: 341: 398:Bergen Corpus of London Teenage Language 117:"Institute of the Czech National Corpus" 423:Corpus of Contemporary American English 108: 336:Institute of the Czech National Corpus 234: 190: 121:Institute of the Czech National Corpus 74:Contemporary spontaneous spoken Czech 7: 566:Scottish Corpus of Texts and Speech 463:Switchboard Telephone Speech Corpus 58:: the SYN-series corpora maps the 14: 297:K. KuÄŤera and M. Stluka. (2014). 546:Neo-Assyrian Text Corpus Project 438:International Corpus of English 256:F. ÄŚermák and A. Rosen (2012). 1: 443:Lancaster-Oslo-Bergen Corpus 84:Multilingual parallel corpus 70:, and morphological tagging. 96:Specialised linguistic data 674: 90:Diachronic corpus of Czech 56:Synchronic written corpora 601:Thesaurus Linguae Graecae 586:Tehran Monolingual Corpus 571:Slovenian National Corpus 541:National Corpus of Polish 516:Croatian National Corpus 511:Croatian Language Corpus 418:Cambridge English Corpus 388:American National Corpus 561:Russian National Corpus 531:German Reference Corpus 403:British National Corpus 306:Proceedings of LREC2014 277:10.1075/ijcl.17.3.05cer 221:Proceedings of LREC2012 177:Proceedings of LREC2014 27:of written and spoken 591:Tekstaro de Esperanto 556:Quranic Arabic Corpus 551:Persian Speech Corpus 521:Czech National Corpus 458:Spoken English Corpus 448:Oxford English Corpus 21:Czech National Corpus 596:TenTen Corpus Family 331:ÄŚeskĂ˝ národnĂ­ korpus 658:Linguistic research 372:Corpus linguistics 41:corpus linguistics 33:Charles University 635: 634: 127:on 9 January 2019 665: 536:Hamshahri Corpus 496:Bijankhan Corpus 365: 358: 351: 342: 318: 317: 315: 313: 303: 294: 288: 287: 285: 283: 262: 253: 247: 246: 240: 232: 230: 228: 218: 209: 203: 202: 196: 188: 172: 166: 165: 163: 161: 152: 143: 137: 136: 134: 132: 123:. Archived from 113: 673: 672: 668: 667: 666: 664: 663: 662: 638: 637: 636: 631: 605: 526:Europarl Corpus 488: 482: 393:Bank of English 380: 374: 369: 327: 322: 321: 311: 309: 301: 296: 295: 291: 281: 279: 260: 255: 254: 250: 233: 226: 224: 216: 211: 210: 206: 189: 174: 173: 169: 159: 157: 150: 145: 144: 140: 130: 128: 115: 114: 110: 105: 49: 17: 12: 11: 5: 671: 669: 661: 660: 655: 650: 648:Czech language 640: 639: 633: 632: 630: 629: 624: 619: 617:BNC consortium 613: 611: 607: 606: 604: 603: 598: 593: 588: 583: 578: 573: 568: 563: 558: 553: 548: 543: 538: 533: 528: 523: 518: 513: 508: 503: 498: 492: 490: 484: 483: 481: 480: 475: 470: 465: 460: 455: 450: 445: 440: 435: 430: 425: 420: 415: 413:Buckeye Corpus 410: 405: 400: 395: 390: 384: 382: 376: 375: 370: 368: 367: 360: 353: 345: 339: 338: 333: 326: 325:External links 323: 320: 319: 289: 271:(3): 411–427. 248: 204: 167: 146:KĹ™en, Michal. 138: 107: 106: 104: 101: 100: 99: 93: 87: 81: 78:Czech Republic 71: 60:Czech language 48: 47:Areas of focus 45: 29:Czech language 15: 13: 10: 9: 6: 4: 3: 2: 670: 659: 656: 654: 651: 649: 646: 645: 643: 628: 627:Sketch Engine 625: 623: 620: 618: 615: 614: 612: 610:Organizations 608: 602: 599: 597: 594: 592: 589: 587: 584: 582: 579: 577: 574: 572: 569: 567: 564: 562: 559: 557: 554: 552: 549: 547: 544: 542: 539: 537: 534: 532: 529: 527: 524: 522: 519: 517: 514: 512: 509: 507: 504: 502: 499: 497: 494: 493: 491: 487:Text corpora, 485: 479: 476: 474: 471: 469: 466: 464: 461: 459: 456: 454: 451: 449: 446: 444: 441: 439: 436: 434: 431: 429: 426: 424: 421: 419: 416: 414: 411: 409: 406: 404: 401: 399: 396: 394: 391: 389: 386: 385: 383: 379:Text corpora, 377: 373: 366: 361: 359: 354: 352: 347: 346: 343: 337: 334: 332: 329: 328: 324: 307: 300: 293: 290: 278: 274: 270: 266: 259: 252: 249: 244: 238: 222: 215: 208: 205: 200: 194: 186: 182: 178: 171: 168: 156: 149: 142: 139: 126: 122: 118: 112: 109: 102: 97: 94: 91: 88: 85: 82: 79: 75: 72: 69: 68:lemmatization 65: 61: 57: 54: 53: 52: 46: 44: 42: 38: 34: 30: 26: 22: 520: 428:Enron Corpus 408:Brown Corpus 310:. Retrieved 305: 292: 280:. Retrieved 268: 264: 251: 237:cite journal 225:. Retrieved 220: 207: 193:cite journal 176: 170: 158:. Retrieved 154: 141: 129:. Retrieved 125:the original 120: 111: 95: 89: 83: 73: 55: 50: 20: 18: 489:non-English 223:: 3345–3349 179:: 160–164. 642:Categories 103:References 312:9 January 308:: 165–168 282:9 January 227:9 January 160:8 January 131:8 January 576:TalkBank 453:PropBank 433:EnTenTen 64:metadata 653:Corpora 622:COBUILD 581:Tatoeba 501:CHILDES 473:VerbNet 381:English 185:2586912 183:  37:Prague 25:corpus 468:TIMIT 302:(PDF) 261:(PDF) 217:(PDF) 181:S2CID 151:(PDF) 314:2019 284:2019 243:link 229:2019 199:link 162:2019 133:2019 19:The 273:doi 35:in 644:: 304:. 269:13 267:. 263:. 239:}} 235:{{ 219:. 195:}} 191:{{ 153:. 119:. 66:, 364:e 357:t 350:v 316:. 286:. 275:: 245:) 231:. 201:) 187:. 164:. 135:.

Index

corpus
Czech language
Charles University
Prague
corpus linguistics
Czech language
metadata
lemmatization
Czech Republic
"Institute of the Czech National Corpus"
the original
"Recent Developments in the Czech National Corpus"
S2CID
2586912
cite journal
link
"Balanced data repository of spontaneous spoken Czech"
cite journal
link
"The case of InterCorp, a multilingual parallel corpus"
doi
10.1075/ijcl.17.3.05cer
"Corpus of 19th century Czech texts: Problems and solutions"
Český národní korpus
Institute of the Czech National Corpus
v
t
e
Corpus linguistics
American National Corpus

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

↑