Lesk algorithm - Knowledge (XXG)

77:"A comparative evaluation performed by Vasilescu et al. (2004) has shown that the simplified Lesk algorithm can significantly outperform the original definition of the algorithm, both in terms of precision and efficiency. By evaluating the disambiguation algorithms on the Senseval-2 English all words data, they measure a 58% precision using the simplified Lesk algorithm compared to the only 42% under the original algorithm. 274: 32:

Lesk algorithm, have demonstrated improved precision and efficiency. However, the Lesk algorithm has faced criticism for its sensitivity to definition wording and its reliance on brief glosses. Researchers have sought to enhance its accuracy by incorporating additional resources like thesauruses and syntactic models.

73:

In Simplified Lesk algorithm, the correct meaning of each word in a given context is determined individually by locating the sense that overlaps the most between its dictionary definition and the given context. Rather than simultaneously determining the meanings of all words in a given context, this

40:

The Lesk algorithm is based on the assumption that words in a given "neighborhood" (section of text) will tend to share a common topic. A simplified version of the Lesk algorithm is to compare the dictionary definition of an ambiguous word with the terms contained in its neighborhood. Versions have

80:

Note: Vasilescu et al. implementation considers a back-off strategy for words not covered by the algorithm, consisting of the most frequent sense defined in WordNet. This means that words for which all their possible meanings lead to zero overlap with current context or with other word definitions

224:

Adapted/Extended Lesk (Banerjee and Pederson, 2002/2003): In the adaptive lesk algorithm, a word vector is created corresponds to every content word in the wordnet gloss. Concatenating glosses of related concepts in WordNet can be used to augment this vector. The vector contains the co-occurrence

31:

in 1986. It operates on the premise that words within a given context are likely to share a common meaning. This algorithm compares the dictionary definitions of an ambiguous word with the words in its surrounding context to determine the most appropriate sense. Variations, such as the Simplified

211:

A lot of work has appeared offering different modifications of this algorithm. These works use other resources for analysis (thesauruses, synonyms dictionaries or morphological and syntactic models): for instance, it may use such information as synonyms, different derivatives, or words from

207:

Unfortunately, Lesk’s approach is very sensitive to the exact wording of definitions, so the absence of a certain word can radically change the results. Further, the algorithm determines overlaps only among the glosses of the senses being considered. This is a significant limitation in that

225:

counts of words co-occurring with w in a large corpus. Adding all the word vectors for all the content words in its gloss creates the Gloss vector g for a concept. Relatedness is determined by comparing the gloss vector using the

198:

The COMPUTEOVERLAP function returns the number of words in common between two sets, ignoring function words or other words on a stop list. The original Lesk algorithm defines the context in a more complex way.

49:

for every sense of the word being disambiguated one should count the number of words that are in both the neighborhood of that word and in the dictionary definition of that sense

306: 469: 484: 479: 433: 309:. In SIGDOC '86: Proceedings of the 5th annual international conference on Systems documentation, pages 24-26, New York, NY, USA. ACM. 56:

A frequently used example illustrating this algorithm is for the context "pine cone". The following dictionary definitions are used:

329: 62:

CONE 1. solid body which narrows to a point 2. something of this shape whether solid or hollow 3. fruit of certain evergreen trees

208:

dictionary glosses tend to be fairly short and do not provide sufficient vocabulary to relate fine-grained sense distinctions.

406:

Banerjee, Satanjeev; Pedersen, Ted (2002-02-17). "An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet".

74:

approach tackles each word individually, independent of the meaning of the other words occurring in the same context.

307:

Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone

287: 24: 344:. In Proceedings of the 2nd International Conference on Language Resourcesand Evaluation, LREC, Athens, Greece. 411: 396:(in Russian). J. Nauchno-Tehnicheskaya Informaciya (NTI), ISSN 0548-0027, ser. 2, N 3, 2004, pp. 10–15. 449: 410:. Lecture Notes in Computer Science. Vol. 2276. Springer, Berlin, Heidelberg. pp. 136–145. 416: 393: 59:

PINE 1. kinds of evergreen tree with needle-shaped leaves 2. waste away through sorrow or illness

320: 474: 279: 429: 325: 226: 421: 380: 367: 354: 341: 52:

the sense that is to be chosen is the sense that has the largest number of this count.

463: 85:

Simplified LESK Algorithm with smart default word sense (Vasilescu et al., 2004)

273: 269: 425: 28: 394:

Automatic resolution of ambiguity of word senses in dictionary definitions

324:, Lecture Notes in Computer Science; Vol. 2276, Pages: 136 - 145, 2002. 42: 321:

An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet

381:

Evaluating Variants of the Lesk Approach for Disambiguating Words

355:

Evaluating Variants of the Lesk Approach for Disambiguating Words

379:

Florentina Vasilescu, Philippe Langlais, and Guy Lapalme. 2004.

353:

Florentina Vasilescu, Philippe Langlais, and Guy Lapalme. 2004.

65:

As can be seen, the best intersection is Pine #1 ⋂ Cone #3 = 2.

233:

There are a lot of studies concerning Lesk and its extensions:

141:

signature <- set of words in the gloss and examples of sense

408:

Computational Linguistics and Intelligent Text Processing

368:

Word Sense Disambiguation: Algorithms and Applications

81:

are by default assigned sense number one in WordNet."

366:Agirre, Eneko & Philip Edmonds (eds.). 2006. 454:, ACM Computing Surveys, 41(2), 2009, pp. 1–69. 109:best-sense <- most frequent sense for word 8: 45:. An implementation might look like this: 415: 88: 298: 212:definitions of words from definitions. 119:context <- set of words in sentence 370:. Dordrecht: Springer. www.wsdbook.org 318:Satanjeev Banerjee and Ted Pedersen. 16:Natural language processing algorithm 7: 392:Alexander Gelbukh, Grigori Sidorov. 340:Kilgarriff and J. Rosenzweig. 2000. 451:Word Sense Disambiguation: A Survey 342:English SENSEVAL:Report and Results 14: 272: 252:Kilgarriff and Rosensweig, 2000; 237:Wilks and Stevenson, 1998, 1999; 1: 258:Nastase and Szpakowicz, 2001; 23:is a classical algorithm for 470:Natural language processing 501: 261:Gelbukh and Sidorov, 2004. 221:Original Lesk (Lesk, 1986) 485:Word-sense disambiguation 480:Computational linguistics 288:Word-sense disambiguation 167:max-overlap <- overlap 69:Simplified Lesk algorithm 25:word sense disambiguation 426:10.1007/3-540-45715-1_11 159:overlap > max-overlap 249:Pook and Catlett, 1988; 172:best-sense <- sense 148:<- COMPUTEOVERLAP ( 240:Mahesh et al., 1997; 104:best sense of word 41:been adapted to use 243:Cowie et al., 1992; 114:max-overlap <- 0 280:Linguistics portal 448:Roberto Navigli. 383:. LREC, Portugal. 357:. LREC, Portugal. 305:Lesk, M. (1986). 227:Cosine similarity 196: 195: 150:signature,context 492: 455: 446: 440: 439: 419: 403: 397: 390: 384: 377: 371: 364: 358: 351: 345: 338: 332: 316: 310: 303: 282: 277: 276: 95:SIMPLIFIED LESK( 89: 500: 499: 495: 494: 493: 491: 490: 489: 460: 459: 458: 447: 443: 436: 417:10.1.1.118.8359 405: 404: 400: 391: 387: 378: 374: 365: 361: 352: 348: 339: 335: 317: 313: 304: 300: 296: 278: 271: 268: 246:Yarowsky, 1992; 218: 205: 133:senses of word 71: 63: 60: 38: 29:Michael E. Lesk 17: 12: 11: 5: 498: 496: 488: 487: 482: 477: 472: 462: 461: 457: 456: 441: 435:978-3540457152 434: 398: 385: 372: 359: 346: 333: 311: 297: 295: 292: 291: 290: 284: 283: 267: 264: 263: 262: 259: 256: 253: 250: 247: 244: 241: 238: 231: 230: 222: 217: 214: 204: 201: 194: 193: 179: 178: 177: 176: 175: 174: 169: 153: 143: 121: 116: 111: 70: 67: 61: 58: 54: 53: 50: 37: 34: 27:introduced by 21:Lesk algorithm 15: 13: 10: 9: 6: 4: 3: 2: 497: 486: 483: 481: 478: 476: 473: 471: 468: 467: 465: 453: 452: 445: 442: 437: 431: 427: 423: 418: 413: 409: 402: 399: 395: 389: 386: 382: 376: 373: 369: 363: 360: 356: 350: 347: 343: 337: 334: 331: 330:3-540-43219-1 327: 323: 322: 315: 312: 308: 302: 299: 293: 289: 286: 285: 281: 275: 270: 265: 260: 257: 254: 251: 248: 245: 242: 239: 236: 235: 234: 228: 223: 220: 219: 216:Lesk variants 215: 213: 209: 202: 200: 192: 190: 186: 183: 173: 170: 168: 165: 164: 163: 160: 157: 154: 151: 147: 144: 142: 139: 138: 137: 134: 131: 128: 125: 122: 120: 117: 115: 112: 110: 107: 106: 105: 102: 98: 97:word,sentence 94: 91: 90: 87: 86: 82: 78: 75: 68: 66: 57: 51: 48: 47: 46: 44: 35: 33: 30: 26: 22: 450: 444: 407: 401: 388: 375: 362: 349: 336: 319: 314: 301: 255:Kwong, 2001; 232: 210: 206: 197: 188: 184: 181: 180: 171: 166: 161: 158: 155: 149: 145: 140: 135: 132: 129: 126: 123: 118: 113: 108: 103: 100: 96: 92: 84: 83: 79: 76: 72: 64: 55: 39: 20: 18: 464:Categories 294:References 203:Criticisms 189:best-sense 475:Semantics 412:CiteSeerX 266:See also 229:measure. 124:for each 93:function 36:Overview 185:return 146:overlap 101:returns 43:WordNet 432: 414: 328: 127:sense 430:ISBN 326:ISBN 182:end 162:then 19:The 422:doi 191:) 466:: 428:. 420:. 156:if 136:do 130:in 438:. 424:: 187:( 152:) 99:)

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

Index