Knowledge (XXG)

Full-text search

Source 📝

243:
proximity of search results to the center of the inner circle. Of all possible results shown, those that were actually returned by the search are shown on a light-blue background. In the example only 1 relevant result of 3 possible relevant results was returned, so the recall is a very low ratio of 1/3, or 33%. The precision for the example is a very low 1/4, or 25%, since only 1 of the 4 results returned was relevant.
231: 36: 837:"A method assigns importance ranks to nodes in a linked database, such as any database of documents containing citations, the world wide web or any other hypermedia database. The rank assigned to a document is calculated from the ranks of documents citing it. In addition, the rank of a document is..." 238:
Recall measures the quantity of relevant results returned by a search, while precision is the measure of the quality of the results returned. Recall is the ratio of relevant results returned to all relevant results. Precision is the ratio of the number of relevant results returned to the total number
202:
However, when the number of documents to search is potentially large, or the quantity of search queries to perform is substantial, the problem of full-text search is often divided into two tasks: indexing and searching. The indexing stage will scan the text of all the documents and build a list of
242:
The diagram at right represents a low-precision, low-recall search. In the diagram the red and green dots represent the total population of potential search results for a given search. Red dots represent irrelevant results, and green dots represent relevant results. Relevancy is indicated by the
308:
algorithms can help reduce false positives. For a search term of "bank", clustering can be used to categorize the document/data universe into "financial institution", "place to sit", "place to store" etc. Depending on the occurrences of words relevant to the categories, search terms or a search
506:
The following is a partial list of available software products whose predominant purpose is to perform full-text indexing and searching. Some of these are accompanied with detailed descriptions of their theory of operation or internal algorithms, which can provide additional insight into how
334:. Document creators (or trained indexers) are asked to supply a list of words that describe the subject of the text, including synonyms of words that describe this subject. Keywords improve recall, particularly if the keyword list includes a search word that is not in the document text. 832: 321:
The deficiencies of full text searching have been addressed in two ways: By providing users with tools that enable them to express their search questions more precisely, and by developing new search algorithms that improve retrieval precision.
396:. This search will retrieve documents about online encyclopedias that use the term "Internet" instead of "online." This increase in precision is very commonly counter-productive since it usually comes with a dramatic loss of recall. 266:
documents in such a way that ambiguities are eliminated. The trade-off between precision and recall is simple: an increase in precision can lower overall recall, while an increase in recall lowers precision.
163:
examines all of the words in every stored document as it tries to match search criteria (for example, text specified by a user). Full-text-searching techniques appeared in the 1960s, for example
214:
The indexer will make an entry in the index for each term or word found in a document, and possibly note its relative position within the document. Usually the indexer will ignore
301:. In the sample diagram to the right, false positives are represented by the irrelevant results (red dots) that were returned by the search (on a light-blue background). 222:
on the words being indexed. For example, the words "drives", "drove", and "driven" will be recorded in the index under the single concept word "drive".
187:
When dealing with a small number of documents, it is possible for the full-text-search engine to directly scan the contents of the documents with each
218:(such as "the" and "and") that are both common and insufficiently meaningful to be useful in searching. Some indexers also employ language-specific 431:. A phrase search matches only those documents that contain two or more words that are separated by a specified number of words; a search for 947: 847: 785: 156:
or on parts of the original texts represented in databases (such as titles, abstracts, selected sections, or bibliographical references).
380:
operator says, in effect, "Do not retrieve any document that contains this word." If the retrieval list retrieves too few documents, the
211:). In the search stage, when performing a specific query, only the index is referenced, rather than the text of the original documents. 282: 119: 53: 100: 754: 385: 179:, employ full-text-search techniques, while others index only a portion of the web pages examined by their indexing systems. 72: 57: 942: 830:, Page, Lawrence, "Method for node ranking in a linked database", published 1998-01-09, issued 2001-09-04. 79: 741: 701: 86: 46: 671: 641: 621: 345: 294: 740:
actually employed by web-search services are seldom fully disclosed out of fear that web entrepreneurs will use
895: 412: 341: 208: 192: 337: 68: 428: 168: 309:
result can be placed in one or more of the categories. This technique is being extensively deployed in the
926: 905: 577: 465:. A search that substitutes one or more characters in a search query for a wildcard character such as an 910: 587: 562: 259: 188: 455:
will search for document that match the given terms and some variation around them (using for instance
827: 611: 376:
operator says, in effect, "Do not retrieve any document unless it contains both of these terms." The
310: 271: 391: 363: 462: 442: 418: 305: 149: 133: 900: 781: 737: 681: 676: 93: 175:
software) provide full-text-search capabilities. Some web search engines, such as the former
886: 421:. A concordance search produces an alphabetical list of all principal words that occur in a 298: 247: 890: 230: 172: 402:. A phrase search matches only those documents that contain a specified phrase, such as 915: 626: 408: 263: 297:). The retrieval of irrelevant documents is often caused by the inherent ambiguity of 936: 656: 542: 517: 495: 456: 399: 357: 160: 758: 736:
In practice, it may be difficult to determine how a given search engine works. The
616: 572: 452: 204: 706: 716: 646: 552: 522: 35: 848:"SAP Adds HANA-Based Software Packages to IoT Portfolio | MarTech Advisor" 340:. Some search engines enable users to limit full text searches to a particular 920: 636: 567: 537: 422: 331: 215: 164: 17: 865: 815: 686: 631: 251: 176: 805:. 12th International Conference on Data Engineering (ICDE'96). p. 164. 803:
Search and ranking algorithms for locating resources on the World Wide Web
696: 691: 651: 527: 491: 483: 466: 367: 255: 219: 153: 145: 141: 415:. This type of search is becoming popular in many e-discovery solutions. 666: 606: 582: 547: 281:
Full-text searching is likely to retrieve many documents that are not
777: 661: 592: 487: 446: 372:) can dramatically increase the precision of a full text search. The 929:- how search engines generate indices to support full-text searching 816:
Experimental Comparison of Schemes for Interpreting Boolean Queries
557: 532: 229: 449:
that can be used to specify retrieval conditions with precision.
196: 445:. A regular expression employs a complex but powerful querying 171:
in the 1990s. Many websites and application programs (such as
29: 411:. A search that is based on multi-word concepts, for example 250:, full-text-search systems typically includes options like 152:. Full-text search is distinguished from searches based on 744:
techniques to improve their prominence in retrieval lists.
435:
would retrieve only those documents in which the words
262:
searching also helps alleviate low-precision issues by
711: 469:. For example, using the asterisk in a search query 60:. Unsourced material may be challenged and removed. 490:gives more prominence to documents to which other 473:will find "sin", "son", "sun", etc. in a text. 234:Diagram of a low-precision, low-recall search 8: 140:refers to techniques for searching a single 289:search question. Such documents are called 755:"Capabilities of Full Text Search System" 120:Learn how and when to remove this message 774:Pro Full-Text Search in SQL Server 2008 729: 167:from 1969, and became common in online 27:Search using the full text of documents 507:full-text search may be accomplished. 439:occur within two words of each other. 7: 459:to threshold the multiple variation) 195:". This is what some tools, such as 58:adding citations to reliable sources 870:cloud.google.com/enterprise-search 25: 384:operator can be used to increase 226:The precision vs. recall tradeoff 433:"Knowledge (XXG)" WITHIN2 "free" 34: 801:B., Yuwono; Lee, D. L. (1996). 304:Clustering techniques based on 45:needs additional citations for 348:, such as "Title" or "Author." 203:search terms (often called an 1: 512:Free and open source software 425:with their immediate context. 207:, but more correctly named a 948:Information retrieval genres 437:"Knowledge (XXG)" and "free" 390:"encyclopedia" AND "online" 964: 742:search engine optimization 702:Thunderstone Software LLC. 672:Fast Search & Transfer 478:Improved search algorithms 404:"Knowledge (XXG), the 💕." 269: 254:to increase precision and 246:Due to the ambiguities of 778:Apress Publishing Company 642:Concept Searching Limited 622:Bar Ilan Responsa Project 498:for additional examples. 388:; consider, for example, 159:In a full-text search, a 896:Compound term processing 413:Compound term processing 394:"Internet" NOT "Encarta" 360:operators (for example, 317:Performance improvements 772:Coles, Michael (2008). 486:algorithm developed by 338:Field-restricted search 326:Improved querying tools 169:bibliographic databases 927:Search engine indexing 906:Information extraction 852:www.martechadvisor.com 776:(Version 1 ed.). 277:False-positive problem 235: 911:Information retrieval 761:on December 23, 2010. 260:Controlled-vocabulary 239:of results returned. 233: 199:, do when searching. 191:, a strategy called " 148:or a collection in a 943:Text editor features 612:Autonomy Corporation 601:Proprietary software 356:. Searches that use 272:Precision and recall 258:to increase recall. 54:improve this article 588:Terrier IR Platform 923:, first FTS engine 866:"Vertex AI Search" 443:Regular expression 419:Concordance search 236: 150:full-text database 69:"Full-text search" 901:Enterprise search 787:978-1-4302-1594-3 738:search algorithms 682:Lucid Imagination 494:have linked. See 130: 129: 122: 104: 16:(Redirected from 955: 887:Pattern matching 874: 873: 862: 856: 855: 844: 838: 836: 835: 831: 824: 818: 813: 807: 806: 798: 792: 791: 769: 763: 762: 757:. Archived from 751: 745: 734: 707:Vertex AI Search 472: 438: 434: 429:Proximity search 405: 395: 383: 379: 375: 371: 355: 354: 344:within a stored 299:natural language 248:natural language 138:full-text search 125: 118: 114: 111: 105: 103: 62: 38: 30: 21: 963: 962: 958: 957: 956: 954: 953: 952: 933: 932: 891:string matching 883: 878: 877: 864: 863: 859: 846: 845: 841: 833: 826: 825: 821: 814: 810: 800: 799: 795: 788: 771: 770: 766: 753: 752: 748: 735: 731: 726: 721: 603: 597: 514: 504: 480: 470: 463:Wildcard search 436: 432: 403: 389: 381: 377: 373: 362:"encyclopedia" 361: 353:Boolean queries 352: 351: 328: 319: 291:false positives 279: 274: 228: 193:serial scanning 185: 173:word processing 126: 115: 109: 106: 63: 61: 51: 39: 28: 23: 22: 15: 12: 11: 5: 961: 959: 951: 950: 945: 935: 934: 931: 930: 924: 918: 916:Faceted search 913: 908: 903: 898: 893: 882: 879: 876: 875: 857: 839: 819: 808: 793: 786: 764: 746: 728: 727: 725: 722: 720: 719: 714: 709: 704: 699: 694: 689: 684: 679: 674: 669: 664: 659: 654: 649: 644: 639: 634: 629: 627:Basis database 624: 619: 614: 609: 602: 599: 598: 596: 595: 590: 585: 580: 575: 570: 565: 560: 555: 550: 545: 540: 535: 530: 525: 520: 513: 510: 509: 503: 500: 479: 476: 475: 474: 460: 450: 440: 426: 416: 409:Concept search 406: 397: 349: 335: 327: 324: 318: 315: 278: 275: 227: 224: 184: 181: 134:text retrieval 128: 127: 42: 40: 33: 26: 24: 18:Boolean search 14: 13: 10: 9: 6: 4: 3: 2: 960: 949: 946: 944: 941: 940: 938: 928: 925: 922: 919: 917: 914: 912: 909: 907: 904: 902: 899: 897: 894: 892: 888: 885: 884: 880: 871: 867: 861: 858: 853: 849: 843: 840: 829: 823: 820: 817: 812: 809: 804: 797: 794: 789: 783: 779: 775: 768: 765: 760: 756: 750: 747: 743: 739: 733: 730: 723: 718: 715: 713: 710: 708: 705: 703: 700: 698: 695: 693: 690: 688: 685: 683: 680: 678: 675: 673: 670: 668: 665: 663: 660: 658: 657:Elasticsearch 655: 653: 650: 648: 645: 643: 640: 638: 635: 633: 630: 628: 625: 623: 620: 618: 615: 613: 610: 608: 605: 604: 600: 594: 591: 589: 586: 584: 581: 579: 576: 574: 571: 569: 566: 564: 561: 559: 556: 554: 551: 549: 546: 544: 541: 539: 536: 534: 531: 529: 526: 524: 521: 519: 518:Apache Lucene 516: 515: 511: 508: 501: 499: 497: 496:Search engine 493: 489: 485: 477: 468: 464: 461: 458: 457:edit distance 454: 451: 448: 444: 441: 430: 427: 424: 420: 417: 414: 410: 407: 401: 400:Phrase search 398: 393: 387: 369: 365: 359: 350: 347: 343: 339: 336: 333: 330: 329: 325: 323: 316: 314: 312: 307: 302: 300: 296: 292: 288: 284: 276: 273: 268: 265: 261: 257: 253: 249: 244: 240: 232: 225: 223: 221: 217: 212: 210: 206: 200: 198: 194: 190: 182: 180: 178: 174: 170: 166: 162: 161:search engine 157: 155: 151: 147: 143: 139: 135: 124: 121: 113: 102: 99: 95: 92: 88: 85: 81: 78: 74: 71: –  70: 66: 65:Find sources: 59: 55: 49: 48: 43:This article 41: 37: 32: 31: 19: 869: 860: 851: 842: 822: 811: 802: 796: 773: 767: 759:the original 749: 732: 617:Azure Search 573:Searchdaimon 528:ArangoSearch 505: 481: 453:Fuzzy search 320: 303: 295:Type I error 290: 286: 280: 245: 241: 237: 213: 201: 186: 158: 137: 131: 116: 107: 97: 90: 83: 76: 64: 52:Please help 47:verification 44: 647:Dieselpoint 553:mnoGoSearch 543:Lemur/Indri 523:Apache Solr 346:data record 311:e-discovery 209:concordance 110:August 2012 937:Categories 921:WebCrawler 828:US 6285999 724:References 637:BRS/Search 568:PostgreSQL 563:OpenSearch 538:KinoSearch 270:See also: 216:stop words 165:IBM STAIRS 80:newspapers 687:MarkLogic 632:Brainware 492:Web pages 370:"Encarta" 366:"online" 252:filtering 177:AltaVista 881:See also 717:Vivísimo 697:Swiftype 692:SAP HANA 652:dtSearch 502:Software 484:PageRank 467:asterisk 332:Keywords 313:domain. 306:Bayesian 287:intended 283:relevant 256:stemming 220:stemming 183:Indexing 154:metadata 146:document 144:-stored 142:computer 677:Inktomi 667:Exalead 607:Algolia 583:Swish-e 548:MariaDB 358:Boolean 285:to the 264:tagging 94:scholar 834:  784:  662:Endeca 593:Xapian 578:Sphinx 488:Google 447:syntax 386:recall 96:  89:  82:  75:  67:  712:Vespa 558:MySQL 533:BaseX 471:"s*n" 342:field 293:(see 205:index 189:query 101:JSTOR 87:books 889:and 782:ISBN 482:The 423:text 197:grep 73:news 378:NOT 374:AND 368:NOT 364:AND 132:In 56:by 939:: 868:. 850:. 780:. 392:OR 382:OR 136:, 872:. 854:. 790:. 123:) 117:( 112:) 108:( 98:· 91:· 84:· 77:· 50:. 20:)

Index

Boolean search

verification
improve this article
adding citations to reliable sources
"Full-text search"
news
newspapers
books
scholar
JSTOR
Learn how and when to remove this message
text retrieval
computer
document
full-text database
metadata
search engine
IBM STAIRS
bibliographic databases
word processing
AltaVista
query
serial scanning
grep
index
concordance
stop words
stemming

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.