Knowledge (XXG)

Talk:Faceted search

Source 📝

419:
classified (with high scoring of metadata) to one or more buckets and the means updated w/ the original seed marked as special. At the end we have groupings of documents and mean vectors still containing the marked seed yet with other related dimensions relevant to the cluster. After completion accumulate the means into a single list of unique dimensions and counts. Is this clustering or classification now? It certainly still has the property of a hill-climber like kmeans yet the outputs are a set of dimensions, counts & membership weights that can be used for faceting the documents (possibly with facets they didn't explicitly contain a-priori!). I'm cheating here, I wrote a paper and code (with co-workers) extending this method to a hierarchical clustering.
232:
results, classifications of results to external data such as taxonomies and folkonomies, etc. (In my opinion) back in the day Oren Etzioni's Grouper was doing faceted search via simple clustering of results, as was Northern Light faceting results via a pre-computed taxonomy (the loose definition - see below). I'm also not sure that the act of doing an imprecise classification into a taxonomy implies that a object could not have a measure of fitness into more than one node of the taxonomy... or that the word 'taxonomy' must be taken to mean that nodes/leaves of the tree must live in only one place in the tree.
891:
needs to limit itself to data that for which faceted classification has been applied, or it needs to drop the part about mass market search, since that is merely about offering useful limiters on the page. Also, there's nothing faceted about WorldCat AFAIK, other than their use of FAST, but because FAST does not link the facets it is actually a removal of facets rather than an application of them. In other words, I think this article is highly problematic, at best. Even the references are poor. I seriously doubt the information presented here.
482:
about what gets put together. Where as with facets, the documents are placed into semantically meaningful ways. This is because a human is in the loop. A human says, "What type of metadata can we talk about? Let's group by that." Conversely, in clustering, a machine puts the documents into groups and then a human says, "Okay, so why are these documents grouped together?" We've seen before (I'm sure I can find a citation if requested.) that labeling clusters of documents is tricky, and it's even more tricky to get machines to do it well.
843:
classification system..." This clause "according to a faceted classification system" is using "faceted" to described a "faceted search", which is horrible English because you have not defined the term "faceted". When using technical terms, you need to define what they mean, otherwise you lose all the people who want to read this article, but do not have the technical background relating specifically to this topic.
71: 53: 147: 22: 289:(my commercial implementations suffered from these issues) once meta-data is included as a heavy factor and multiple assignment is allowed the results can become indistinguishable from faceting. Both techniques are a function that produce a filtering/grouping of results by some label where the assignment of the labels to documents is governed by some extraction + classification task. 684:
click one of the blue right arrows. Now you have a list of things related to Bill Clinton that only belong in that category. You got to this by filtering based on Category (you can also do it based on information via the 'About' tab). Click the 'Your query' link to view the filters, the controls there allow you to remove and add filters.
564:
Please read it again. I made pains to state that clustering does not have to imply single membership. It's possible to build a faceting alg that uses a clustering approach and pre-labeled facets. There is a strong tendency to assume that clustering implies singular membership in a cluster and that
527:
2. Facets group the documents into semantically meaningful ways. We know this, because the facets that describe the document are manually chosen. In clustering, documents are automatically grouped based on some similarity metric, such as cosine similarity, or correlation, or probabilistic methods.
383:
of the information being searched? If so, do we agree that not all ways of organizing result sets use faceted classification? Specifically, can we agree that using a pairwise document similarity measure to arrange documents into groups doesn't use a faceted classification? And that neither does using
747:
It seems that a recurring issue on all Knowledge (XXG) entries related to search is that companies want to be mentioned in the entry (see the previous two sections as examples). I've included only a handful of companies in this entry that are not only notable enough to have Knowledge (XXG) entries,
683:
Regarding the addition of razorbase.com to the list, the article says that FBs "allow users to explore by filtering available information". Go to the home page, type "Bill Clinton", and clicked the 'named' linked, then choose 'connected to'. Now in the resulting page, click the Categories tab, then
288:
This smells like a straw-man argument to me. Are you making the assumption that clustering means each document lives in only one group? Or that one could not do a first level clustering based upon meta-data (like faceting uses). While I agree that vanilla clustering has the problems she described
860:
As a software engineer and consultant with over 20 years experience, I have dealt with stakeholders from all walks of life - from secretaries to CEO's, managers, factory floor assembly workers and engineers, etc. Those are the people Knowledge (XXG) needs to reach and this article surely does not
751:
Nonetheless, I refuse to let this entry become overrun with mentions from companies that don't meet the above criteria--that quickly devolves into spam. I'd sooner remove all company mentions--and even mentions of open-source software if those are controversial too. I've been maintaining this page
481:
locations. You don't have to decide if an Acme Widget is more "Acme" or more "Widget," it's both, because it 'is' both. However this doesn't explain where the facets come from. When we cluster documents together, we're just putting together statistically similar documents. There's no semantics
306:
Maybe this boils down to the interpretation/semantics of the words 'clustering' and 'faceting'. If one assumes that faceting implies multiple membership (dimensions) in meta-data and clustering must be interpreted as being the application of some classic-ish clustering-algorithm then I agree with
197:
In the distinction made for "multiple classifications" vs. a "single, pre-determined, taxonomic order," I'm not sure that "taxonomic" should be used to describe a hierarchical type of organization, since taxonomies are not generally required to be "single" or "pre-determined" or hierarchical as is
890:
I disagree with the statement that faceted search is search against data organized with a faceted classification. There are very very few actual faceted classifications in use, and most online sites with facets are simply using regular metadata attributes to provide limits. So this article either
641:
The very first question on the page is about the distinction between faceted metadata and taxonomy. Maybe we could make the differences more specific, like this 'facets might include topics, subjects, or concepts (like traditional taxonomies), but facets are not limited to those elements'. For
231:
I don't think faceted-search implies this distinction at all! How about faceted search as a method of classifying search results according to one or more methods of extracting patters from the the results. Examples include patterns derived from text-clustering the results, linked attributes of
418:
I can agree with your points and I'm mostly on the same page with you. However one could define a pairwise document similarity that uses metadata as highly weighted feature in the comp. Example: Imagine kmeans that is seeded (non-randomly) with n buckets of n "facet-labels". Each document is
842:
The wording of this article is recursive, which is very poor English. First, the article needs to begin with a simple definition of faceted search. "Faceted search, also called faceted navigation or faceted browsing, is a technique for accessing information organized according to a faceted
617:
New distinction via Dtunkelang (private communication). Facets are defined to be key-value pairs where document can be associated with multiple keys and each key may have multiple value assignments. Machine assignment is allowable. Since classic clustering approaches (bag of words) are not
687:
I talked with Sherman Monroe (who wrote the previous comment) about the razorbase.com link. I still am not convinced either that it is a faceted browser or that, even if it were, it would be an appropriate external link. We've moved our private dialogue into the talk page to promote public
642:
example, few taxonomies would include structures for price, size, compatibility or date, while these are common facets in online catalogs. That way we could remove the question of taxonomy structure and concentrate on the meaningful differences between various data structures. --
422:
I guess I'm arguing for a definition of faceting that supports any suitable method of assigning documents to a set of n distinct dimensions and subsequently allowing a UI to filter based upon that membership... without being unnecessarily pejorative to 'clustering algorithms'.
800:
I'm going to start taking a hard line on external links: no links to pages that are just example applications, and no purely commercial links. Links should either be to free, open-source software or to educational materials. Knowledge (XXG) is not a sales and marketing tool.
261:
that I agree with. There is a looser category of exploratory search interfaces, but I don't think we should call them all faceted if they're not. What does faceted mean, if not that there are multiple facets? I'm not knocking other approaches, just trying to be precise.
618:
structured into multiple keys it's not faceting. This definition does NOT preclude usage of clustering algorithms to infer or generate new key-value assignments to documents. While faceting is a form of result refinement, not all result refinement is faceting.
853:
It is truly shameful that a person can't look at this article and grasp in the first two sentences what "faceted browsing" or "faceted searching" or "faceted navigation" means in simple terms. I consider that a real laziness on the part of the author.
916:"A faceted classification system classifies each information element along multiple explicit dimensions, called facets, enabling the classifications to be accessed and ordered in multiple ways rather than in a single, pre-determined, taxonomic order." 849:
Knowledge (XXG) is going to lose its interest to a wide variety of stakeholders if the articles become too academic with nothing for the lay person to read and understand. Save the heavy academic writing for later in the article.
707:
Looks like I'm having a little reversion war with 78.105.108.216 over the inclusion of the following sentence: "Newer solutions employing faceted search are increasingly being offered to retailers by companies such as
766:
That does it. I've eliminated all references to companies, including ones I feel are worth including. Hopefully everyone can live with this as a fair solution. I refuse to let this entry become a cesspool of spam.
569:
to produce labels for the clusters. The point of the kmeans derived algorithm was to demonstrate this idea. I'm arguing that 'clustering' should not be used as a straw-man to define faceting as NOT clustering.
506:
1. Documents are categorized into multiple orthogonal hierarchies, where as in clustering a document belongs to only one cluster. Even in hierarchical clustering, the document belongs to only one hierarchy.
936:
This is a classic example of computer science trying to be excessively abstract to prove sophistication and just muddling up a simple concept. Just say "search attributes" like they did in the 1980's.
337:
Nealrichter, I actually think you have boiled down a useful technical description of the basic distinction. I realize there are other elements, but your formulation (with notes) works for me:
572:
We have a good working definition now that faceting means multiple membership of documents to multiple possibly orthogonal facets and that the facets should have some semantic meaning.
748:
but have established associations with faceted search. Since I have a past affiliation with Endeca, I could be accused of bias, but I count on others to keep the entry honest.
712:. Such solutions can enable faceted search results to be ordered based on their relevance, rather than simply filtered in or out entirely." I think it's spammy, and that 786:
I'm concerned that any site that is an "example of faceted search" might show up in the external links. Can we agree on a standard of notability and/or content type?
342:
faceting implies multiple membership (dimensions) in meta-data and clustering must be interpreted as being the application of some classic-ish clustering-algorithm
1126: 990: 565:
the cluster labeling problem is too hard. Neither of these is iron clad true if you allow multiple membership in clusters and utilize the same meta-data used by
119: 1136: 823:
Over half the citations and a big gob of the text are devoted to the work of one researcher, plus colleagues. One of the cites might be to self-published work.
125: 384:
a single hierarchical topic organization? If I understand what you mean about faceting implying multiple membership, then we're all on the same page.
1068:
should probably be re-added to the current article. Does anyone know of a good reason why that information was removed and therefore why it should
213:
Can you suggest more precise wording? The distinction is an important one, and I think the only issue is finding the precise words to describe it.
1121: 95: 952: 1086: 671: 531: 199: 165: 78: 58: 994: 877: 588: 157: 181: 1131: 33: 1022: 974: 857:
IT personnel have been accused for decades of not being able to communicate well. This article is a clear example of that.
846:
An introduction should be simple, not delving deep into the topic. That should be saved for the following paragraphs.
970: 1075:
Over time, many academic references, descriptions, and other ‘non-spammy’ details were removed over the years, e.g.:
933:
Do facets only apply to information elements stored in a predefined order? (implied as part of the definition here)
986: 461:, like Dtunkelang said. That means that unlike the clustering approach outlined by Nealrichter above, a document 161: 1095:) that was redirected here - it is not clear how much of that article’s content was actually migrated, however. 1010: 1029: 948: 39: 21: 978: 675: 535: 203: 940: 873: 865: 667: 576: 238: 566: 458: 380: 91: 944: 1103: 1018: 647: 623: 584: 428: 352: 312: 94:
on Knowledge (XXG). If you would like to participate, please visit the project page, where you can join
966: 717: 960:
Your points do indeed deserve more attention, and seem to be somewhat related to other comments above.
869: 752:
with something of an iron fist, but I'm open to discussion here if anyone disagrees with my approach.
643: 619: 580: 424: 348: 308: 233: 806: 791: 772: 757: 733: 693: 389: 267: 218: 175: 1042:, but all of the information regarding non-website applications appears to have since been removed. 828: 816: 713: 709: 465:
live in exactly one location, but rather exists in multiple locations. If we had two facets,
1099: 1014: 896: 1038: 802: 787: 768: 753: 729: 689: 385: 263: 214: 171: 716:, which is marked as an orphan, isn't notable enough for inclusion. I'm being accused of 824: 1115: 728:. Perhaps others without any real or perceived conflicts of interest can chime in. 989:
appear specifically focused on computer programming theory, for example. Perhaps
347:
We shouldn't ignore the other, broader, issues, but this is a darn good start. --
258: 892: 70: 52: 528:
This can lead to clusters that appear to be noisy when judged by human users.
156:
to the subject of this article. Relevant policies and guidelines may include
87: 910:
These database concepts are ancient. Why does this new terminology exist?
257:
Marti Hearst makes a distinction between clustering and faceted search in
83: 1107: 900: 832: 810: 795: 776: 761: 737: 697: 679: 651: 627: 592: 539: 432: 393: 379:
Let me try a different tack: do we agree that faceted search assumes a
356: 316: 271: 242: 222: 207: 725: 721: 905: 457:
I think the fundamental thing that faceted search requires is the
259:"Clustering versus faceted categories for information exploration" 929:
a classification? (probably a specific value of a "dimension")
141: 15: 906:
Aren't facets just database views, searches, and ordering?
154:
contributor may be personally or professionally connected
1092: 1079: 1048: 307:
Hearst. Yet those two words have broader meanings.
82:, a collaborative effort to improve the coverage of 1085:Finally, there was also a separate article titled 124:This article has not yet received a rating on the 965:At the very least, a reference to something like 1005:Related: Does Knowledge (XXG) have an existing, 503:So in conclusion: No, clustering is not facets. 991:Category:Library cataloging and classification 1093:last revision prior to blanking and redirect 8: 1036:This page was long ago moved from the title 19: 47: 473:, then an Acme Widget exists in both the 913:The language is abstract and ambiguous: 886:FS different from faceted classification 975:Parametric search (Information science) 49: 1127:Unknown-importance Computing articles 104:Knowledge (XXG):WikiProject Computing 7: 1137:Articles with connected contributors 720:because of my past affiliation with 76:This article is within the scope of 38:It is of interest to the following 1080:another example of removed content 971:Parametric search (user interface) 14: 995:Category:Knowledge representation 955:) 11:31, September 20, 2021 (UTC) 1049:last revision prior to page move 145: 69: 51: 20: 1122:Start-Class Computing articles 663:razorbase as a faceted browser 152:The following Knowledge (XXG) 107:Template:WikiProject Computing 1: 1087:Informative Faceted Searching 1072:be restored in some fashion? 985:(Most of the items listed at 833:12:17, 4 September 2011 (UTC) 762:16:03, 30 December 2009 (UTC) 98:and see a list of open tasks. 1013:that we could link to here? 738:14:10, 7 December 2009 (UTC) 223:18:41, 2 February 2009 (UTC) 208:19:23, 20 January 2009 (UTC) 880:) 06:10, 31 December 2014‎ 777:01:57, 5 January 2010 (UTC) 1153: 1011:Information search methods 987:Category:Search algorithms 926:a dimension of an element? 126:project's importance scale 1108:01:50, 23 July 2022 (UTC) 1065:Faceted semantic browsers 1030:Faceted semantic browsers 1023:00:08, 23 July 2022 (UTC) 123: 64: 46: 1009:, outline or summary of 997:may provide more leads?) 811:13:37, 13 May 2010 (UTC) 724:and my present one with 698:15:12, 19 May 2009 (UTC) 680:15:06, 19 May 2009 (UTC) 652:18:10, 21 May 2009 (UTC) 628:02:47, 22 May 2009 (UTC) 593:23:45, 21 May 2009 (UTC) 540:22:08, 21 May 2009 (UTC) 433:19:21, 21 May 2009 (UTC) 394:18:38, 21 May 2009 (UTC) 357:18:10, 21 May 2009 (UTC) 317:16:39, 21 May 2009 (UTC) 272:15:47, 21 May 2009 (UTC) 243:15:19, 21 May 2009 (UTC) 979:Multidimensional search 923:an information element? 901:20:07, 8 May 2015 (UTC) 796:18:19, 7 May 2010 (UTC) 658:Razorbase External Link 1132:All Computing articles 567:faceted classification 459:faceted classification 381:faceted classification 92:information technology 28:This article is rated 861:communicate to them. 166:neutral point of view 79:WikiProject Computing 32:on Knowledge (XXG)'s 1056:A subsection titled 158:conflict of interest 110:Computing articles 34:content assessment 981:seems warranted. 967:Search attributes 943:comment added by 882: 868:comment added by 670:comment added by 596: 579:comment added by 190: 189: 140: 139: 136: 135: 132: 131: 1144: 1059:Faceted browsers 1047:(see, e.g., the 956: 881: 862: 743:Company Mentions 682: 595: 573: 246: 149: 148: 142: 112: 111: 108: 105: 102: 73: 66: 65: 55: 48: 31: 25: 24: 16: 1152: 1151: 1147: 1146: 1145: 1143: 1142: 1141: 1112: 1111: 1039:Faceted browser 1034: 938: 908: 888: 863: 840: 821: 784: 745: 705: 665: 660: 574: 236: 195: 146: 109: 106: 103: 100: 99: 29: 12: 11: 5: 1150: 1148: 1140: 1139: 1134: 1129: 1124: 1114: 1113: 1083: 1082: 1054: 1053: 1033: 1027: 1026: 1025: 1003: 1001: 1000: 999: 963: 961: 945:BenjaminGSlade 931: 930: 927: 924: 907: 904: 887: 884: 839: 836: 820: 814: 783: 782:External Links 780: 744: 741: 704: 701: 659: 656: 655: 654: 638: 637: 636: 635: 634: 633: 632: 631: 630: 608: 607: 606: 605: 604: 603: 602: 601: 600: 599: 598: 597: 570: 551: 550: 549: 548: 547: 546: 545: 544: 543: 542: 529: 516: 515: 514: 513: 512: 511: 510: 509: 508: 507: 504: 492: 491: 490: 489: 488: 487: 486: 485: 484: 483: 446: 445: 444: 443: 442: 441: 440: 439: 438: 437: 436: 435: 420: 405: 404: 403: 402: 401: 400: 399: 398: 397: 396: 368: 367: 366: 365: 364: 363: 362: 361: 360: 359: 345: 344: 339: 338: 326: 325: 324: 323: 322: 321: 320: 319: 297: 296: 295: 294: 293: 292: 291: 290: 279: 278: 277: 276: 275: 274: 250: 249: 248: 247: 241:comment added 226: 225: 198:implied here. 194: 191: 188: 187: 186: 185: 150: 138: 137: 134: 133: 130: 129: 122: 116: 115: 113: 96:the discussion 74: 62: 61: 56: 44: 43: 37: 26: 13: 10: 9: 6: 4: 3: 2: 1149: 1138: 1135: 1133: 1130: 1128: 1125: 1123: 1120: 1119: 1117: 1110: 1109: 1105: 1101: 1096: 1094: 1091: 1088: 1081: 1078: 1077: 1076: 1073: 1071: 1067: 1066: 1061: 1060: 1052: 1050: 1045: 1044: 1043: 1041: 1040: 1031: 1028: 1024: 1020: 1016: 1012: 1008: 1007:non-technical 1004: 1002: 998: 996: 992: 988: 983: 982: 980: 976: 972: 968: 964: 962: 959: 958: 957: 954: 950: 946: 942: 934: 928: 925: 922: 921: 920: 917: 914: 911: 903: 902: 898: 894: 885: 883: 879: 875: 871: 867: 858: 855: 851: 847: 844: 838:Too technical 837: 835: 834: 830: 826: 818: 815: 813: 812: 808: 804: 798: 797: 793: 789: 781: 779: 778: 774: 770: 764: 763: 759: 755: 749: 742: 740: 739: 735: 731: 727: 723: 719: 715: 711: 702: 700: 699: 695: 691: 685: 681: 677: 673: 672:76.73.133.188 669: 664: 657: 653: 649: 645: 640: 639: 629: 625: 621: 616: 615: 614: 613: 612: 611: 610: 609: 594: 590: 586: 582: 578: 571: 568: 563: 562: 561: 560: 559: 558: 557: 556: 555: 554: 553: 552: 541: 537: 533: 532:128.114.60.40 530: 526: 525: 524: 523: 522: 521: 520: 519: 518: 517: 505: 502: 501: 500: 499: 498: 497: 496: 495: 494: 493: 480: 476: 472: 468: 464: 460: 456: 455: 454: 453: 452: 451: 450: 449: 448: 447: 434: 430: 426: 421: 417: 416: 415: 414: 413: 412: 411: 410: 409: 408: 407: 406: 395: 391: 387: 382: 378: 377: 376: 375: 374: 373: 372: 371: 370: 369: 358: 354: 350: 346: 343: 340: 336: 335: 334: 333: 332: 331: 330: 329: 328: 327: 318: 314: 310: 305: 304: 303: 302: 301: 300: 299: 298: 287: 286: 285: 284: 283: 282: 281: 280: 273: 269: 265: 260: 256: 255: 254: 253: 252: 251: 244: 240: 235: 230: 229: 228: 227: 224: 220: 216: 212: 211: 210: 209: 205: 201: 192: 183: 180: 177: 173: 170: 169: 167: 163: 162:autobiography 159: 155: 151: 144: 143: 127: 121: 118: 117: 114: 97: 93: 89: 85: 81: 80: 75: 72: 68: 67: 63: 60: 57: 54: 50: 45: 41: 35: 27: 23: 18: 17: 1097: 1089: 1084: 1074: 1069: 1064: 1063: 1058: 1057: 1055: 1046: 1037: 1035: 1006: 984: 939:— Preceding 935: 932: 919:So what is: 918: 915: 912: 909: 889: 864:— Preceding 859: 856: 852: 848: 845: 841: 822: 799: 785: 765: 750: 746: 706: 688:discussion. 686: 662: 661: 478: 474: 470: 466: 462: 341: 200:69.91.164.31 196: 178: 153: 77: 40:WikiProjects 1100:Jim Grisham 1015:Jim Grisham 870:84.24.63.85 718:WP:CONFLICT 666:—Preceding 644:Searchtools 620:Nealrichter 581:Nealrichter 575:—Preceding 479:type=widget 425:Nealrichter 349:Searchtools 309:Nealrichter 237:—Preceding 234:Nealrichter 30:Start-class 1116:Categories 1051:; c. 2008) 803:Dtunkelang 788:Dtunkelang 769:Dtunkelang 754:Dtunkelang 730:Dtunkelang 714:PrismaStar 710:PrismaStar 703:PrismaStar 690:Dtunkelang 475:brand=Acme 386:Dtunkelang 264:Dtunkelang 215:Dtunkelang 172:Dtunkelang 825:Yakushima 101:Computing 88:computing 84:computers 59:Computing 953:contribs 941:unsigned 878:contribs 866:unsigned 819:concerns 817:WP:UNDUE 668:unsigned 589:contribs 577:unsigned 463:does not 193:Taxonomy 182:contribs 239:undated 893:LaMona 726:Google 722:Endeca 164:, and 90:, and 36:scale. 1090:(see 977:, or 467:brand 1104:talk 1019:talk 993:and 949:talk 897:talk 874:talk 829:talk 807:talk 792:talk 773:talk 758:talk 734:talk 694:talk 676:talk 648:talk 624:talk 585:talk 536:talk 477:and 471:type 469:and 429:talk 390:talk 353:talk 313:talk 268:talk 219:talk 204:talk 176:talk 1070:not 1062:or 168:. 120:??? 1118:: 1106:) 1098:- 1021:) 969:, 951:• 899:) 876:• 831:) 809:) 794:) 775:) 760:) 736:) 696:) 678:) 650:) 626:) 591:) 587:• 538:) 431:) 392:) 355:) 315:) 270:) 221:) 206:) 160:, 86:, 1102:( 1032:? 1017:( 973:/ 947:( 895:( 872:( 827:( 805:( 790:( 771:( 756:( 732:( 692:( 674:( 646:( 622:( 583:( 534:( 427:( 388:( 351:( 311:( 266:( 245:. 217:( 202:( 184:) 179:· 174:( 128:. 42::

Index


content assessment
WikiProjects
WikiProject icon
Computing
WikiProject icon
WikiProject Computing
computers
computing
information technology
the discussion
???
project's importance scale
conflict of interest
autobiography
neutral point of view
Dtunkelang
talk
contribs
69.91.164.31
talk
19:23, 20 January 2009 (UTC)
Dtunkelang
talk
18:41, 2 February 2009 (UTC)
Nealrichter
undated
15:19, 21 May 2009 (UTC)
"Clustering versus faceted categories for information exploration"
Dtunkelang

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.