Knowledge

CLAWS (linguistics)

Source ๐Ÿ“

275:. A general-purpose grammatical tagger, it is a successor of the CLAWS1 tagger. In tagging the BNC, the many rounds of work that went into CLAWS4 focused on making the CLAWS program independent from the tagsets. For example, the BNC project used two tagset versions: "a main tagset (C5) with 62 tags with which the whole of the corpus has been tagged, and a larger (C7) tagset with 152 tags, which has been used to make a selected 'core' sample corpus of two million words." The latest version of CLAWS4 is offered by UCREL, a research center of 193:-----_PUN "_PUQ Welcome_VVB to_PRP my_DPS house_NN1 !_SENT -----_PUN Enter_VVB freely_AV0 and_CJC of_PRF your_DPS own_DT0 will_NN1 !_PUN "_SENT -----_PUN He_PNP made_VVD no_AT0 motion_NN1 of_PRF stepping_VVG to_TO0 meet_VVI me_PNP ,_PUN but_CJC stood_VVD like_PRP a_AT0 statue_NN1 ,_PUN as_CJS though_CJS his_DPS gesture_NN1 of_PRF welcome_NN1 had_VHD fixed_VVN him_PNP into_PRP stone_SENT ._PUN 25: 157:
Since its inception, CLAWS has been hailed for its functionality and adaptability. Still, it is not without flaws, and though it boasts an error-rate of only 1.5% when judged in major categories, CLAWS still remains with c.3.3% ambiguities unresolved. Ambiguity arises in cases such as with the word
145:
A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word (and other token), such as noun, verb, adjective, etc., although generally computational applications use more fine-grained POS tags like 'noun-plural'. Developed in
258:
From 1983 to 1986, updated versions leading to CLAWS2 were part of a larger attempt to deal with aspects such as recognizing sentence breaks, in order to avoid the need for manual pre-processing of a text before the tags were applied, moving instead to optional manual post-editing to adjust the
205:
Enter_VV0 freely_RR and_CC of_IO your_APPGE own_DA will_NN1 !_! "_" He_PPHS1 made_VVD no_AT motion_NN1 of_IO stepping_VVG to_TO meet_VVI me_PPIO1, _, but_CCB stood_VVD like_II a_AT1 statue_NN1, _, as_CS21 though_CS22 his_APPGE gesture_NN1 of_IO welcome_NN1 had_VHD fixed_VVN him_PPHO1 into_II
648:
Garside, Roger. 1996. The robust tagging of unrestricted text: the BNC experience. In J. Thomas & M. short (Eds.) Using Corpora for language research: Studies in the honour of Geoffrey Leech. (pp. 167โ€“180). London. Longman. p.
133:
by the University Centre for Computer Corpus Research on Language. It has an overall accuracy rate of 96โ€“97% with the latest version (CLAWS4) tagging around 100 million words of the
783: 146:
the early 1980s, CLAWS was built to fill the ever-growing gap created by always-changing POS necessities. Originally created to add part-of-speech tags to the
323:
The standard CLAWS7 tagset is used currently. It is only different in the punctuation tags when compared to the CLAWS6 tagset. See Table of tags in C7 tagset
335:
CLAWS8 tagset was extended from C7 tagset with further distinctions in the determiner and pronoun categories, as well as 37 new auxiliary tags for forms of
365: 162:
and whether it should be classified as a noun or a verb. It's these ambiguities that will require the various upgrades and tagsets that CLAWS will endure.
308: 497: 477: 773: 42: 793: 108: 89: 442: 61: 46: 68: 147: 574:
Garside, Roger. 1996. The robust tagging of unrestricted text: the BNC experience. In J. Thomas & M. short (Eds.)
778: 259:
output of the automatic annotation, if needed. The CLAWS2 tagset has 166 word tags. See Table of tags in C2 tagset
75: 57: 370: 304: 288: 272: 134: 35: 468:
Atwell, E.S. 2008. Development of tag sets for part-of-speech tagging. In: Ludeling, A and Kyto, M, (eds.)
788: 360: 126: 150:
corpus of British English, the CLAWS tagset has since been adapted to other languages as well, including
455:
Garside, Roger. 1987. The CLAWS word-tagging system. In: R. Garside, G. Leech & G. Sampson (eds.),
380: 276: 130: 385: 172: 490: 311:
corpus. It has over 160 tags, including 13 determiner subtypes. See Table of tags in C6 tagset
82: 473: 242:, CLAWS1 tagset, has 132 word tags. In terms of form and application, C1 tagset is similar to 225:
output will generally look like, with the most likely part-of-speech tag following each word.
175:
to determine the likelihood of sequences of words in anticipating each part-of-speech label.
434: 767: 375: 355: 243: 214: 24: 625: 576:
Using Corpora for language research: Studies in the honour of Geoffrey Leech.
731: 707: 683: 601: 551: 324: 312: 292: 260: 247: 344: 758: 527: 410: 151: 457:
The Computational Analysis of English: A corpus based approach.
18: 522: 520: 518: 516: 514: 512: 510: 659: 123:
Constituent Likelihood Automatic Word-tagging System (CLAWS)
470:
Corpus Linguistics: An International Handbook, Volume 1.
626:"CLAWS4: THE TAGGING OF THE BRITISH NATIONAL CORPUS" 291:, has over 60 tags. See Table of tags in C5 tagset 49:. Unsourced material may be challenged and removed. 217:'s Dracula (1897) has been tagged using both the 439:The Stanford Natural Language Processing Group 271:The CLAWS4 was used for the 100-million-word 203:"_" Welcome_VV0 to_II my_APPGE house_NN1 !_! 8: 435:"Stanford Log-linear Part-Of-Speech Tagger" 366:Sliding window based part-of-speech tagging 16:Computer program for part-of-speech tagging 109:Learn how and when to remove this message 183: 759:CLAWS part-of-speech tagger for English 397: 587:Booth, Barbara. 1985. Revising CLAWS. 287:The CLAWS5 tagset, which was used for 405: 403: 401: 246:tags. See Table of tags in C1 tagset 7: 784:Tasks of natural language processing 491:"Part of Speech Tagging (Chapter 5)" 47:adding citations to reliable sources 303:The CLAWS6 tagset was used for the 129:. It was developed in the 1980s at 221:C5 and C7 tagsets. This is what a 14: 343:. See Table of tags in C8 tagset 503:from the original on 2018-04-17. 445:from the original on 2004-10-25. 23: 660:"UCREL home page, Lancaster UK" 578:(pp. 167โ€“180). London. Longman. 34:needs additional citations for 238:The first tagset developed in 1: 528:"CLAWS part-of-speech tagger" 411:"CLAWS part-of-speech tagger" 371:British National Corpus (BNC) 273:British National Corpus (BNC) 472:Walter de Gruyter, 501โ€“526. 774:Natural language processing 552:"UCREL CLAWS1 (LOB) Tagset" 125:is a program that performs 810: 794:Word-sense disambiguation 185:Sample outputs of CLAWS 58:"CLAWS" linguistics 307:sampler corpus and the 135:British National Corpus 361:Part-of-speech tagging 127:part-of-speech tagging 732:"UCREL CLAWS7 Tagset" 708:"UCREL CLAWS6 Tagset" 684:"UCREL CLAWS5 Tagset" 602:"UCREL CLAWS2 Tagset" 381:Lancaster University 277:Lancaster University 166:Rules and processing 131:Lancaster University 43:improve this article 386:Hidden Markov model 186: 173:Hidden Markov model 779:Corpus linguistics 213:This excerpt from 184: 736:ucrel.lancs.ac.uk 712:ucrel.lancs.ac.uk 688:ucrel.lancs.ac.uk 664:ucrel.lancs.ac.uk 630:ucrel.lancs.ac.uk 606:ucrel.lancs.ac.uk 556:ucrel.lancs.ac.uk 532:ucrel.lancs.ac.uk 478:978-3-11-021142-9 415:ucrel.lancs.ac.uk 211: 210: 119: 118: 111: 93: 801: 746: 745: 743: 742: 728: 722: 721: 719: 718: 704: 698: 697: 695: 694: 680: 674: 673: 671: 670: 656: 650: 646: 640: 639: 637: 636: 622: 616: 615: 613: 612: 598: 592: 585: 579: 572: 566: 565: 563: 562: 548: 542: 541: 539: 538: 524: 505: 504: 502: 495: 486: 480: 466: 460: 453: 447: 446: 431: 425: 424: 422: 421: 407: 187: 114: 107: 103: 100: 94: 92: 51: 27: 19: 809: 808: 804: 803: 802: 800: 799: 798: 764: 763: 755: 750: 749: 740: 738: 730: 729: 725: 716: 714: 706: 705: 701: 692: 690: 682: 681: 677: 668: 666: 658: 657: 653: 647: 643: 634: 632: 624: 623: 619: 610: 608: 600: 599: 595: 586: 582: 573: 569: 560: 558: 550: 549: 545: 536: 534: 526: 525: 508: 500: 493: 488: 487: 483: 467: 463: 454: 450: 433: 432: 428: 419: 417: 409: 408: 399: 394: 352: 333: 321: 301: 285: 269: 256: 236: 231: 182: 168: 143: 115: 104: 98: 95: 52: 50: 40: 28: 17: 12: 11: 5: 807: 805: 797: 796: 791: 786: 781: 776: 766: 765: 762: 761: 754: 753:External links 751: 748: 747: 723: 699: 675: 651: 641: 617: 593: 580: 567: 543: 506: 489:McCoy, Kathy. 481: 461: 448: 426: 396: 395: 393: 390: 389: 388: 383: 378: 373: 368: 363: 358: 351: 348: 332: 329: 320: 317: 300: 297: 284: 281: 268: 265: 255: 252: 235: 232: 230: 227: 209: 208: 206:stone_NN1 ._. 201: 195: 194: 191: 181: 178: 167: 164: 142: 139: 117: 116: 31: 29: 22: 15: 13: 10: 9: 6: 4: 3: 2: 806: 795: 792: 790: 789:Markov models 787: 785: 782: 780: 777: 775: 772: 771: 769: 760: 757: 756: 752: 737: 733: 727: 724: 713: 709: 703: 700: 689: 685: 679: 676: 665: 661: 655: 652: 645: 642: 631: 627: 621: 618: 607: 603: 597: 594: 590: 589:ICAME Journal 584: 581: 577: 571: 568: 557: 553: 547: 544: 533: 529: 523: 521: 519: 517: 515: 513: 511: 507: 499: 492: 485: 482: 479: 475: 471: 465: 462: 458: 452: 449: 444: 440: 436: 430: 427: 416: 412: 406: 404: 402: 398: 391: 387: 384: 382: 379: 377: 374: 372: 369: 367: 364: 362: 359: 357: 354: 353: 349: 347: 346: 342: 338: 331:CLAWS8 tagset 330: 328: 326: 319:CLAWS7 tagset 318: 316: 314: 310: 306: 299:CLAWS6 tagset 298: 296: 294: 290: 283:CLAWS5 tagset 282: 280: 278: 274: 267:CLAWS4 tagset 266: 264: 262: 254:CLAWS2 tagset 253: 251: 249: 245: 241: 234:CLAWS1 tagset 233: 228: 226: 224: 220: 216: 207: 202: 200: 197: 196: 192: 189: 188: 180:Sample output 179: 177: 176: 174: 171:CLAWS uses a 165: 163: 161: 155: 153: 149: 140: 138: 136: 132: 128: 124: 113: 110: 102: 91: 88: 84: 81: 77: 74: 70: 67: 63: 60: โ€“  59: 55: 54:Find sources: 48: 44: 38: 37: 32:This article 30: 26: 21: 20: 739:. Retrieved 735: 726: 715:. Retrieved 711: 702: 691:. Retrieved 687: 678: 667:. Retrieved 663: 654: 644: 633:. Retrieved 629: 620: 609:. Retrieved 605: 596: 588: 583: 575: 570: 559:. Retrieved 555: 546: 535:. Retrieved 531: 484: 469: 464: 456: 451: 438: 429: 418:. Retrieved 414: 376:Brown Corpus 356:Brill tagger 340: 336: 334: 322: 302: 286: 270: 257: 244:Brown Corpus 239: 237: 222: 218: 212: 204: 198: 170: 169: 159: 156: 154:and Arabic. 144: 122: 120: 105: 96: 86: 79: 72: 65: 53: 41:Please help 36:verification 33: 215:Bram Stoker 768:Categories 741:2020-04-12 717:2020-04-12 693:2020-04-20 669:2020-04-12 635:2020-04-12 611:2020-04-12 561:2020-04-12 537:2020-04-12 420:2020-04-01 392:References 69:newspapers 591:9:29โ€“35. 498:Archived 459:Longman. 443:Archived 350:See also 99:May 2020 229:Tagsets 141:History 83:scholar 476:  339:, and 337:be, do 160:flies, 85:  78:  71:  64:  56:  501:(PDF) 494:(PDF) 240:CLAWS 223:CLAWS 219:CLAWS 90:JSTOR 76:books 649:169. 474:ISBN 345:here 341:have 325:here 313:here 309:COLT 293:here 261:here 248:here 152:Urdu 121:The 62:news 305:BNC 289:BNC 190:C5 148:LOB 45:by 770:: 734:. 710:. 686:. 662:. 628:. 604:. 554:. 530:. 509:^ 496:. 441:. 437:. 413:. 400:^ 327:. 315:. 295:. 279:. 263:. 250:. 199:C7 137:. 744:. 720:. 696:. 672:. 638:. 614:. 564:. 540:. 423:. 112:) 106:( 101:) 97:( 87:ยท 80:ยท 73:ยท 66:ยท 39:.

Index


verification
improve this article
adding citations to reliable sources
"CLAWS" linguistics
news
newspapers
books
scholar
JSTOR
Learn how and when to remove this message
part-of-speech tagging
Lancaster University
British National Corpus
LOB
Urdu
Hidden Markov model
Bram Stoker
Brown Corpus
here
here
British National Corpus (BNC)
Lancaster University
BNC
here
BNC
COLT
here
here
here

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

โ†‘