Knowledge (XXG)

Lemur Project

Source 📝

505: 251:. The Lemur Project develops search engines, browser toolbars, text analysis tools, and data resources that support research and development of information retrieval and text mining software. The project is best known for its Indri and Galago search engines, the ClueWeb09 and ClueWeb12 datasets, and the RankLib learning-to-rank library. The software and datasets are used widely in scientific and research applications, as well as in some commercial applications. 25: 127: 254:
The Lemur Project's software development philosophy emphasizes state-of-the-art accuracy, flexibility, and efficiency. For example, the Indri search engine provides accurate search for large text collections 'out of the box', and data is stored in an accessible manner to support development of new
411:
Updates to the Lemur Project components are made twice a year, in June and December. The latest version of the Indri search engine is 5.17. The latest version of the Galago search engine is version 3.18. The latest version of the RankLib learning-to-rank library is 2.14. The latest version of the
420:
The Indri search engine is one of the components developed by the Lemur Project. It is open source. The query language that is used in Indri allows researchers to index data or structure documents using simple command line instructions. Indri offers flexibility in terms of adaptation to various
270:, and it comes along with the source files and build instructions. The provided source code can be modified for the purpose of developing new libraries. It is compatible with various operating systems which include Linux and Windows. 421:
current applications. It also can be distributed across a cluster of nodes for high performance. The Indri search engine can handle large collections of data and can understand various data formats like
255:
retrieval strategies. Software from the Lemur Project is distributed under open-source licenses that provide flexibility to scientists and software developers.
570: 546: 478: 151:
of the topic and provide significant coverage of it beyond a mere trivial mention. If notability cannot be shown, the article is likely to be
240: 222: 108: 46: 39: 539: 148: 565: 299: 244: 437: 203: 89: 512: 175: 61: 144: 160: 532: 433: 267: 182: 68: 248: 35: 504: 259: 189: 75: 137: 171: 57: 305: 156: 152: 366: 344: 336: 516: 196: 82: 559: 239:
is a collaboration between the Center for Intelligent Information Retrieval at the
24: 294: 432:
The Indri API supports various programming and scripting languages like C++,
289: 316: 263: 143:
Please help to demonstrate the notability of the topic by citing
422: 490: 441: 426: 120: 18: 520: 258:The programming languages used to create Lemur are 453:Can make use of multiple document representations 390:Galago search engine research framework in Java 16:Information retrieval and text mining research 540: 8: 383:Lemur Project has the following components: 547: 533: 223:Learn how and when to remove this message 109:Learn how and when to remove this message 479:List of information retrieval libraries 412:Sifaka data mining application is 1.8. 278:Lemur supports the following features: 45:Please improve this article by adding 7: 501: 499: 571:Free and open-source software stubs 322:Passage and cross-lingual retrieval 241:University of Massachusetts Amherst 519:. You can help Knowledge (XXG) by 14: 285:English, Chinese, and Arabic text 503: 399:ClueWeb09 and ClueWeb12 datasets 393:RankLib learning-to-rank library 125: 23: 448:Features of Indri Search Engine 245:Language Technologies Institute 136:may not meet Knowledge (XXG)'s 468:Can be efficiently implemented 396:Sifaka data mining application 1: 513:free and open-source software 358:Database based ranking (CORI) 47:secondary or tertiary sources 138:general notability guideline 587: 498: 387:Indri search engine in C++ 249:Carnegie Mellon University 145:reliable secondary sources 134:The topic of this article 491:The Lemur Project website 341:Structured query language 304:Passage and incremental 456:Explicit term weighting 566:Free software projects 462:Formally well-grounded 374:Simple text processing 34:relies excessively on 459:Robust query language 355:Query-based sampling 328:Query model updating 416:Indri Search Engine 367:Document clustering 331:Two stage smoothing 337:Relevance feedback 325:Language modeling 315:Ad hoc retrieval ( 140: 528: 527: 402:Query Log Toolbar 233: 232: 225: 207: 135: 119: 118: 111: 93: 578: 549: 542: 535: 507: 500: 465:Highly effective 352:Distributed IR: 228: 221: 217: 214: 208: 206: 165: 129: 128: 121: 114: 107: 103: 100: 94: 92: 51: 27: 19: 586: 585: 581: 580: 579: 577: 576: 575: 556: 555: 554: 553: 496: 487: 475: 450: 418: 409: 381: 361:Results merging 276: 229: 218: 212: 209: 172:"Lemur Project" 166: 164: 142: 130: 126: 115: 104: 98: 95: 58:"Lemur Project" 52: 50: 44: 40:primary sources 28: 17: 12: 11: 5: 584: 582: 574: 573: 568: 558: 557: 552: 551: 544: 537: 529: 526: 525: 508: 494: 493: 486: 485:External links 483: 482: 481: 474: 471: 470: 469: 466: 463: 460: 457: 454: 449: 446: 417: 414: 408: 407:Latest Version 405: 404: 403: 400: 397: 394: 391: 388: 380: 377: 376: 375: 372: 369: 364: 363: 362: 359: 356: 350: 349: 348: 342: 339: 334: 333: 332: 329: 323: 320: 310: 309: 308: 302: 297: 292: 286: 275: 272: 231: 230: 133: 131: 124: 117: 116: 31: 29: 22: 15: 13: 10: 9: 6: 4: 3: 2: 583: 572: 569: 567: 564: 563: 561: 550: 545: 543: 538: 536: 531: 530: 524: 522: 518: 515:article is a 514: 509: 506: 502: 497: 492: 489: 488: 484: 480: 477: 476: 472: 467: 464: 461: 458: 455: 452: 451: 447: 445: 443: 439: 435: 430: 428: 424: 415: 413: 406: 401: 398: 395: 392: 389: 386: 385: 384: 378: 373: 371:Summarization 370: 368: 365: 360: 357: 354: 353: 351: 347:term matching 346: 343: 340: 338: 335: 330: 327: 326: 324: 321: 318: 314: 313: 311: 307: 303: 301: 298: 296: 293: 291: 287: 284: 283: 281: 280: 279: 273: 271: 269: 265: 261: 256: 252: 250: 246: 242: 238: 237:Lemur Project 227: 224: 216: 213:December 2020 205: 202: 198: 195: 191: 188: 184: 181: 177: 174: –  173: 169: 168:Find sources: 162: 158: 154: 150: 146: 139: 132: 123: 122: 113: 110: 102: 91: 88: 84: 81: 77: 74: 70: 67: 63: 60: –  59: 55: 54:Find sources: 48: 42: 41: 37: 32:This article 30: 26: 21: 20: 521:expanding it 510: 495: 431: 419: 410: 382: 319:and InQuery) 300:Tokenization 277: 257: 253: 236: 234: 219: 210: 200: 193: 186: 179: 167: 105: 96: 86: 79: 72: 65: 53: 33: 312:Retrieval: 149:independent 99:August 2011 560:Categories 379:Components 295:Stop words 282:Indexing: 183:newspapers 157:redirected 69:newspapers 36:references 147:that are 473:See also 345:Wildcard 306:indexing 290:stemming 274:Features 243:and the 197:scholar 161:deleted 83:scholar 440:, and 317:TF-IDF 266:, and 199:  192:  185:  178:  170:  153:merged 85:  78:  71:  64:  56:  511:This 288:Word 204:JSTOR 190:books 159:, or 90:JSTOR 76:books 517:stub 434:Java 425:and 423:HTML 268:Java 235:The 176:news 62:news 442:PHP 427:XML 264:C++ 247:at 38:to 562:: 444:. 438:C# 436:, 429:. 262:, 155:, 49:. 548:e 541:t 534:v 523:. 260:C 226:) 220:( 215:) 211:( 201:· 194:· 187:· 180:· 163:. 141:. 112:) 106:( 101:) 97:( 87:· 80:· 73:· 66:· 43:.

Index


references
primary sources
secondary or tertiary sources
"Lemur Project"
news
newspapers
books
scholar
JSTOR
Learn how and when to remove this message
general notability guideline
reliable secondary sources
independent
merged
redirected
deleted
"Lemur Project"
news
newspapers
books
scholar
JSTOR
Learn how and when to remove this message
University of Massachusetts Amherst
Language Technologies Institute
Carnegie Mellon University
C
C++
Java

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.