419:
classified (with high scoring of metadata) to one or more buckets and the means updated w/ the original seed marked as special. At the end we have groupings of documents and mean vectors still containing the marked seed yet with other related dimensions relevant to the cluster. After completion accumulate the means into a single list of unique dimensions and counts. Is this clustering or classification now? It certainly still has the property of a hill-climber like kmeans yet the outputs are a set of dimensions, counts & membership weights that can be used for faceting the documents (possibly with facets they didn't explicitly contain a-priori!). I'm cheating here, I wrote a paper and code (with co-workers) extending this method to a hierarchical clustering.
232:
results, classifications of results to external data such as taxonomies and folkonomies, etc. (In my opinion) back in the day Oren
Etzioni's Grouper was doing faceted search via simple clustering of results, as was Northern Light faceting results via a pre-computed taxonomy (the loose definition - see below). I'm also not sure that the act of doing an imprecise classification into a taxonomy implies that a object could not have a measure of fitness into more than one node of the taxonomy... or that the word 'taxonomy' must be taken to mean that nodes/leaves of the tree must live in only one place in the tree.
891:
needs to limit itself to data that for which faceted classification has been applied, or it needs to drop the part about mass market search, since that is merely about offering useful limiters on the page. Also, there's nothing faceted about WorldCat AFAIK, other than their use of FAST, but because FAST does not link the facets it is actually a removal of facets rather than an application of them. In other words, I think this article is highly problematic, at best. Even the references are poor. I seriously doubt the information presented here.
482:
about what gets put together. Where as with facets, the documents are placed into semantically meaningful ways. This is because a human is in the loop. A human says, "What type of metadata can we talk about? Let's group by that." Conversely, in clustering, a machine puts the documents into groups and then a human says, "Okay, so why are these documents grouped together?" We've seen before (I'm sure I can find a citation if requested.) that labeling clusters of documents is tricky, and it's even more tricky to get machines to do it well.
843:
classification system..." This clause "according to a faceted classification system" is using "faceted" to described a "faceted search", which is horrible
English because you have not defined the term "faceted". When using technical terms, you need to define what they mean, otherwise you lose all the people who want to read this article, but do not have the technical background relating specifically to this topic.
71:
53:
147:
22:
289:(my commercial implementations suffered from these issues) once meta-data is included as a heavy factor and multiple assignment is allowed the results can become indistinguishable from faceting. Both techniques are a function that produce a filtering/grouping of results by some label where the assignment of the labels to documents is governed by some extraction + classification task.
684:
click one of the blue right arrows. Now you have a list of things related to Bill
Clinton that only belong in that category. You got to this by filtering based on Category (you can also do it based on information via the 'About' tab). Click the 'Your query' link to view the filters, the controls there allow you to remove and add filters.
564:
Please read it again. I made pains to state that clustering does not have to imply single membership. It's possible to build a faceting alg that uses a clustering approach and pre-labeled facets. There is a strong tendency to assume that clustering implies singular membership in a cluster and that
527:
2. Facets group the documents into semantically meaningful ways. We know this, because the facets that describe the document are manually chosen. In clustering, documents are automatically grouped based on some similarity metric, such as cosine similarity, or correlation, or probabilistic methods.
383:
of the information being searched? If so, do we agree that not all ways of organizing result sets use faceted classification? Specifically, can we agree that using a pairwise document similarity measure to arrange documents into groups doesn't use a faceted classification? And that neither does using
747:
It seems that a recurring issue on all
Knowledge (XXG) entries related to search is that companies want to be mentioned in the entry (see the previous two sections as examples). I've included only a handful of companies in this entry that are not only notable enough to have Knowledge (XXG) entries,
683:
Regarding the addition of razorbase.com to the list, the article says that FBs "allow users to explore by filtering available information". Go to the home page, type "Bill
Clinton", and clicked the 'named' linked, then choose 'connected to'. Now in the resulting page, click the Categories tab, then
288:
This smells like a straw-man argument to me. Are you making the assumption that clustering means each document lives in only one group? Or that one could not do a first level clustering based upon meta-data (like faceting uses). While I agree that vanilla clustering has the problems she described
860:
As a software engineer and consultant with over 20 years experience, I have dealt with stakeholders from all walks of life - from secretaries to CEO's, managers, factory floor assembly workers and engineers, etc. Those are the people
Knowledge (XXG) needs to reach and this article surely does not
751:
Nonetheless, I refuse to let this entry become overrun with mentions from companies that don't meet the above criteria--that quickly devolves into spam. I'd sooner remove all company mentions--and even mentions of open-source software if those are controversial too. I've been maintaining this page
481:
locations. You don't have to decide if an Acme Widget is more "Acme" or more "Widget," it's both, because it 'is' both. However this doesn't explain where the facets come from. When we cluster documents together, we're just putting together statistically similar documents. There's no semantics
306:
Maybe this boils down to the interpretation/semantics of the words 'clustering' and 'faceting'. If one assumes that faceting implies multiple membership (dimensions) in meta-data and clustering must be interpreted as being the application of some classic-ish clustering-algorithm then I agree with
197:
In the distinction made for "multiple classifications" vs. a "single, pre-determined, taxonomic order," I'm not sure that "taxonomic" should be used to describe a hierarchical type of organization, since taxonomies are not generally required to be "single" or "pre-determined" or hierarchical as is
890:
I disagree with the statement that faceted search is search against data organized with a faceted classification. There are very very few actual faceted classifications in use, and most online sites with facets are simply using regular metadata attributes to provide limits. So this article either
641:
The very first question on the page is about the distinction between faceted metadata and taxonomy. Maybe we could make the differences more specific, like this 'facets might include topics, subjects, or concepts (like traditional taxonomies), but facets are not limited to those elements'. For
231:
I don't think faceted-search implies this distinction at all! How about faceted search as a method of classifying search results according to one or more methods of extracting patters from the the results. Examples include patterns derived from text-clustering the results, linked attributes of
418:
I can agree with your points and I'm mostly on the same page with you. However one could define a pairwise document similarity that uses metadata as highly weighted feature in the comp. Example: Imagine kmeans that is seeded (non-randomly) with n buckets of n "facet-labels". Each document is
842:
The wording of this article is recursive, which is very poor
English. First, the article needs to begin with a simple definition of faceted search. "Faceted search, also called faceted navigation or faceted browsing, is a technique for accessing information organized according to a faceted
617:
New distinction via
Dtunkelang (private communication). Facets are defined to be key-value pairs where document can be associated with multiple keys and each key may have multiple value assignments. Machine assignment is allowable. Since classic clustering approaches (bag of words) are not
687:
I talked with
Sherman Monroe (who wrote the previous comment) about the razorbase.com link. I still am not convinced either that it is a faceted browser or that, even if it were, it would be an appropriate external link. We've moved our private dialogue into the talk page to promote public
642:
example, few taxonomies would include structures for price, size, compatibility or date, while these are common facets in online catalogs. That way we could remove the question of taxonomy structure and concentrate on the meaningful differences between various data structures. --
422:
I guess I'm arguing for a definition of faceting that supports any suitable method of assigning documents to a set of n distinct dimensions and subsequently allowing a UI to filter based upon that membership... without being unnecessarily pejorative to 'clustering algorithms'.
800:
I'm going to start taking a hard line on external links: no links to pages that are just example applications, and no purely commercial links. Links should either be to free, open-source software or to educational materials. Knowledge (XXG) is not a sales and marketing tool.
261:
that I agree with. There is a looser category of exploratory search interfaces, but I don't think we should call them all faceted if they're not. What does faceted mean, if not that there are multiple facets? I'm not knocking other approaches, just trying to be precise.
618:
structured into multiple keys it's not faceting. This definition does NOT preclude usage of clustering algorithms to infer or generate new key-value assignments to documents. While faceting is a form of result refinement, not all result refinement is faceting.
853:
It is truly shameful that a person can't look at this article and grasp in the first two sentences what "faceted browsing" or "faceted searching" or "faceted navigation" means in simple terms. I consider that a real laziness on the part of the author.
916:"A faceted classification system classifies each information element along multiple explicit dimensions, called facets, enabling the classifications to be accessed and ordered in multiple ways rather than in a single, pre-determined, taxonomic order."
849:
Knowledge (XXG) is going to lose its interest to a wide variety of stakeholders if the articles become too academic with nothing for the lay person to read and understand. Save the heavy academic writing for later in the article.
707:
Looks like I'm having a little reversion war with 78.105.108.216 over the inclusion of the following sentence: "Newer solutions employing faceted search are increasingly being offered to retailers by companies such as
766:
That does it. I've eliminated all references to companies, including ones I feel are worth including. Hopefully everyone can live with this as a fair solution. I refuse to let this entry become a cesspool of spam.
569:
to produce labels for the clusters. The point of the kmeans derived algorithm was to demonstrate this idea. I'm arguing that 'clustering' should not be used as a straw-man to define faceting as NOT clustering.
506:
1. Documents are categorized into multiple orthogonal hierarchies, where as in clustering a document belongs to only one cluster. Even in hierarchical clustering, the document belongs to only one hierarchy.
936:
This is a classic example of computer science trying to be excessively abstract to prove sophistication and just muddling up a simple concept. Just say "search attributes" like they did in the 1980's.
337:
Nealrichter, I actually think you have boiled down a useful technical description of the basic distinction. I realize there are other elements, but your formulation (with notes) works for me:
572:
We have a good working definition now that faceting means multiple membership of documents to multiple possibly orthogonal facets and that the facets should have some semantic meaning.
748:
but have established associations with faceted search. Since I have a past affiliation with Endeca, I could be accused of bias, but I count on others to keep the entry honest.
712:. Such solutions can enable faceted search results to be ordered based on their relevance, rather than simply filtered in or out entirely." I think it's spammy, and that
786:
I'm concerned that any site that is an "example of faceted search" might show up in the external links. Can we agree on a standard of notability and/or content type?
342:
faceting implies multiple membership (dimensions) in meta-data and clustering must be interpreted as being the application of some classic-ish clustering-algorithm
1126:
990:
565:
the cluster labeling problem is too hard. Neither of these is iron clad true if you allow multiple membership in clusters and utilize the same meta-data used by
119:
1136:
823:
Over half the citations and a big gob of the text are devoted to the work of one researcher, plus colleagues. One of the cites might be to self-published work.
125:
384:
a single hierarchical topic organization? If I understand what you mean about faceting implying multiple membership, then we're all on the same page.
1068:
should probably be re-added to the current article. Does anyone know of a good reason why that information was removed and therefore why it should
213:
Can you suggest more precise wording? The distinction is an important one, and I think the only issue is finding the precise words to describe it.
1121:
95:
952:
1086:
671:
531:
199:
165:
78:
58:
994:
877:
588:
157:
181:
1131:
33:
1022:
974:
857:
IT personnel have been accused for decades of not being able to communicate well. This article is a clear example of that.
846:
An introduction should be simple, not delving deep into the topic. That should be saved for the following paragraphs.
970:
1075:
Over time, many academic references, descriptions, and other ‘non-spammy’ details were removed over the years, e.g.:
933:
Do facets only apply to information elements stored in a predefined order? (implied as part of the definition here)
986:
461:, like Dtunkelang said. That means that unlike the clustering approach outlined by Nealrichter above, a document
161:
1095:) that was redirected here - it is not clear how much of that article’s content was actually migrated, however.
1010:
1029:
948:
39:
21:
978:
675:
535:
203:
940:
873:
865:
667:
576:
238:
566:
458:
380:
91:
944:
1103:
1018:
647:
623:
584:
428:
352:
312:
94:
on
Knowledge (XXG). If you would like to participate, please visit the project page, where you can join
966:
717:
960:
Your points do indeed deserve more attention, and seem to be somewhat related to other comments above.
869:
752:
with something of an iron fist, but I'm open to discussion here if anyone disagrees with my approach.
643:
619:
580:
424:
348:
308:
233:
806:
791:
772:
757:
733:
693:
389:
267:
218:
175:
1042:, but all of the information regarding non-website applications appears to have since been removed.
828:
816:
713:
709:
465:
live in exactly one location, but rather exists in multiple locations. If we had two facets,
1099:
1014:
896:
1038:
802:
787:
768:
753:
729:
689:
385:
263:
214:
171:
716:, which is marked as an orphan, isn't notable enough for inclusion. I'm being accused of
824:
1115:
728:. Perhaps others without any real or perceived conflicts of interest can chime in.
989:
appear specifically focused on computer programming theory, for example. Perhaps
347:
We shouldn't ignore the other, broader, issues, but this is a darn good start. --
258:
892:
70:
52:
528:
This can lead to clusters that appear to be noisy when judged by human users.
156:
to the subject of this article. Relevant policies and guidelines may include
87:
910:
These database concepts are ancient. Why does this new terminology exist?
257:
Marti Hearst makes a distinction between clustering and faceted search in
83:
1107:
900:
832:
810:
795:
776:
761:
737:
697:
679:
651:
627:
592:
539:
432:
393:
379:
Let me try a different tack: do we agree that faceted search assumes a
356:
316:
271:
242:
222:
207:
725:
721:
905:
457:
I think the fundamental thing that faceted search requires is the
259:"Clustering versus faceted categories for information exploration"
929:
a classification? (probably a specific value of a "dimension")
141:
15:
906:
Aren't facets just database views, searches, and ordering?
154:
contributor may be personally or professionally connected
1092:
1079:
1048:
307:
Hearst. Yet those two words have broader meanings.
82:, a collaborative effort to improve the coverage of
1085:Finally, there was also a separate article titled
124:This article has not yet received a rating on the
965:At the very least, a reference to something like
1005:Related: Does Knowledge (XXG) have an existing,
503:So in conclusion: No, clustering is not facets.
991:Category:Library cataloging and classification
1093:last revision prior to blanking and redirect
8:
1036:This page was long ago moved from the title
19:
47:
473:, then an Acme Widget exists in both the
913:The language is abstract and ambiguous:
886:FS different from faceted classification
975:Parametric search (Information science)
49:
1127:Unknown-importance Computing articles
104:Knowledge (XXG):WikiProject Computing
7:
1137:Articles with connected contributors
720:because of my past affiliation with
76:This article is within the scope of
38:It is of interest to the following
1080:another example of removed content
971:Parametric search (user interface)
14:
995:Category:Knowledge representation
955:) 11:31, September 20, 2021 (UTC)
1049:last revision prior to page move
145:
69:
51:
20:
1122:Start-Class Computing articles
663:razorbase as a faceted browser
152:The following Knowledge (XXG)
107:Template:WikiProject Computing
1:
1087:Informative Faceted Searching
1072:be restored in some fashion?
985:(Most of the items listed at
833:12:17, 4 September 2011 (UTC)
762:16:03, 30 December 2009 (UTC)
98:and see a list of open tasks.
1013:that we could link to here?
738:14:10, 7 December 2009 (UTC)
223:18:41, 2 February 2009 (UTC)
208:19:23, 20 January 2009 (UTC)
880:) 06:10, 31 December 2014
777:01:57, 5 January 2010 (UTC)
1153:
1011:Information search methods
987:Category:Search algorithms
926:a dimension of an element?
126:project's importance scale
1108:01:50, 23 July 2022 (UTC)
1065:Faceted semantic browsers
1030:Faceted semantic browsers
1023:00:08, 23 July 2022 (UTC)
123:
64:
46:
1009:, outline or summary of
997:may provide more leads?)
811:13:37, 13 May 2010 (UTC)
724:and my present one with
698:15:12, 19 May 2009 (UTC)
680:15:06, 19 May 2009 (UTC)
652:18:10, 21 May 2009 (UTC)
628:02:47, 22 May 2009 (UTC)
593:23:45, 21 May 2009 (UTC)
540:22:08, 21 May 2009 (UTC)
433:19:21, 21 May 2009 (UTC)
394:18:38, 21 May 2009 (UTC)
357:18:10, 21 May 2009 (UTC)
317:16:39, 21 May 2009 (UTC)
272:15:47, 21 May 2009 (UTC)
243:15:19, 21 May 2009 (UTC)
979:Multidimensional search
923:an information element?
901:20:07, 8 May 2015 (UTC)
796:18:19, 7 May 2010 (UTC)
658:Razorbase External Link
1132:All Computing articles
567:faceted classification
459:faceted classification
381:faceted classification
92:information technology
28:This article is rated
861:communicate to them.
166:neutral point of view
79:WikiProject Computing
32:on Knowledge (XXG)'s
1056:A subsection titled
158:conflict of interest
110:Computing articles
34:content assessment
981:seems warranted.
967:Search attributes
943:comment added by
882:
868:comment added by
670:comment added by
596:
579:comment added by
190:
189:
140:
139:
136:
135:
132:
131:
1144:
1059:Faceted browsers
1047:(see, e.g., the
956:
881:
862:
743:Company Mentions
682:
595:
573:
246:
149:
148:
142:
112:
111:
108:
105:
102:
73:
66:
65:
55:
48:
31:
25:
24:
16:
1152:
1151:
1147:
1146:
1145:
1143:
1142:
1141:
1112:
1111:
1039:Faceted browser
1034:
938:
908:
888:
863:
840:
821:
784:
745:
705:
665:
660:
574:
236:
195:
146:
109:
106:
103:
100:
99:
29:
12:
11:
5:
1150:
1148:
1140:
1139:
1134:
1129:
1124:
1114:
1113:
1083:
1082:
1054:
1053:
1033:
1027:
1026:
1025:
1003:
1001:
1000:
999:
963:
961:
945:BenjaminGSlade
931:
930:
927:
924:
907:
904:
887:
884:
839:
836:
820:
814:
783:
782:External Links
780:
744:
741:
704:
701:
659:
656:
655:
654:
638:
637:
636:
635:
634:
633:
632:
631:
630:
608:
607:
606:
605:
604:
603:
602:
601:
600:
599:
598:
597:
570:
551:
550:
549:
548:
547:
546:
545:
544:
543:
542:
529:
516:
515:
514:
513:
512:
511:
510:
509:
508:
507:
504:
492:
491:
490:
489:
488:
487:
486:
485:
484:
483:
446:
445:
444:
443:
442:
441:
440:
439:
438:
437:
436:
435:
420:
405:
404:
403:
402:
401:
400:
399:
398:
397:
396:
368:
367:
366:
365:
364:
363:
362:
361:
360:
359:
345:
344:
339:
338:
326:
325:
324:
323:
322:
321:
320:
319:
297:
296:
295:
294:
293:
292:
291:
290:
279:
278:
277:
276:
275:
274:
250:
249:
248:
247:
241:comment added
226:
225:
198:implied here.
194:
191:
188:
187:
186:
185:
150:
138:
137:
134:
133:
130:
129:
122:
116:
115:
113:
96:the discussion
74:
62:
61:
56:
44:
43:
37:
26:
13:
10:
9:
6:
4:
3:
2:
1149:
1138:
1135:
1133:
1130:
1128:
1125:
1123:
1120:
1119:
1117:
1110:
1109:
1105:
1101:
1096:
1094:
1091:
1088:
1081:
1078:
1077:
1076:
1073:
1071:
1067:
1066:
1061:
1060:
1052:
1050:
1045:
1044:
1043:
1041:
1040:
1031:
1028:
1024:
1020:
1016:
1012:
1008:
1007:non-technical
1004:
1002:
998:
996:
992:
988:
983:
982:
980:
976:
972:
968:
964:
962:
959:
958:
957:
954:
950:
946:
942:
934:
928:
925:
922:
921:
920:
917:
914:
911:
903:
902:
898:
894:
885:
883:
879:
875:
871:
867:
858:
855:
851:
847:
844:
838:Too technical
837:
835:
834:
830:
826:
818:
815:
813:
812:
808:
804:
798:
797:
793:
789:
781:
779:
778:
774:
770:
764:
763:
759:
755:
749:
742:
740:
739:
735:
731:
727:
723:
719:
715:
711:
702:
700:
699:
695:
691:
685:
681:
677:
673:
672:76.73.133.188
669:
664:
657:
653:
649:
645:
640:
639:
629:
625:
621:
616:
615:
614:
613:
612:
611:
610:
609:
594:
590:
586:
582:
578:
571:
568:
563:
562:
561:
560:
559:
558:
557:
556:
555:
554:
553:
552:
541:
537:
533:
532:128.114.60.40
530:
526:
525:
524:
523:
522:
521:
520:
519:
518:
517:
505:
502:
501:
500:
499:
498:
497:
496:
495:
494:
493:
480:
476:
472:
468:
464:
460:
456:
455:
454:
453:
452:
451:
450:
449:
448:
447:
434:
430:
426:
421:
417:
416:
415:
414:
413:
412:
411:
410:
409:
408:
407:
406:
395:
391:
387:
382:
378:
377:
376:
375:
374:
373:
372:
371:
370:
369:
358:
354:
350:
346:
343:
340:
336:
335:
334:
333:
332:
331:
330:
329:
328:
327:
318:
314:
310:
305:
304:
303:
302:
301:
300:
299:
298:
287:
286:
285:
284:
283:
282:
281:
280:
273:
269:
265:
260:
256:
255:
254:
253:
252:
251:
244:
240:
235:
230:
229:
228:
227:
224:
220:
216:
212:
211:
210:
209:
205:
201:
192:
183:
180:
177:
173:
170:
169:
167:
163:
162:autobiography
159:
155:
151:
144:
143:
127:
121:
118:
117:
114:
97:
93:
89:
85:
81:
80:
75:
72:
68:
67:
63:
60:
57:
54:
50:
45:
41:
35:
27:
23:
18:
17:
1097:
1089:
1084:
1074:
1069:
1064:
1063:
1058:
1057:
1055:
1046:
1037:
1035:
1006:
984:
939:— Preceding
935:
932:
919:So what is:
918:
915:
912:
909:
889:
864:— Preceding
859:
856:
852:
848:
845:
841:
822:
799:
785:
765:
750:
746:
706:
688:discussion.
686:
662:
661:
478:
474:
470:
466:
462:
341:
200:69.91.164.31
196:
178:
153:
77:
40:WikiProjects
1100:Jim Grisham
1015:Jim Grisham
870:84.24.63.85
718:WP:CONFLICT
666:—Preceding
644:Searchtools
620:Nealrichter
581:Nealrichter
575:—Preceding
479:type=widget
425:Nealrichter
349:Searchtools
309:Nealrichter
237:—Preceding
234:Nealrichter
30:Start-class
1116:Categories
1051:; c. 2008)
803:Dtunkelang
788:Dtunkelang
769:Dtunkelang
754:Dtunkelang
730:Dtunkelang
714:PrismaStar
710:PrismaStar
703:PrismaStar
690:Dtunkelang
475:brand=Acme
386:Dtunkelang
264:Dtunkelang
215:Dtunkelang
172:Dtunkelang
825:Yakushima
101:Computing
88:computing
84:computers
59:Computing
953:contribs
941:unsigned
878:contribs
866:unsigned
819:concerns
817:WP:UNDUE
668:unsigned
589:contribs
577:unsigned
463:does not
193:Taxonomy
182:contribs
239:undated
893:LaMona
726:Google
722:Endeca
164:, and
90:, and
36:scale.
1090:(see
977:, or
467:brand
1104:talk
1019:talk
993:and
949:talk
897:talk
874:talk
829:talk
807:talk
792:talk
773:talk
758:talk
734:talk
694:talk
676:talk
648:talk
624:talk
585:talk
536:talk
477:and
471:type
469:and
429:talk
390:talk
353:talk
313:talk
268:talk
219:talk
204:talk
176:talk
1070:not
1062:or
168:.
120:???
1118::
1106:)
1098:-
1021:)
969:,
951:•
899:)
876:•
831:)
809:)
794:)
775:)
760:)
736:)
696:)
678:)
650:)
626:)
591:)
587:•
538:)
431:)
392:)
355:)
315:)
270:)
221:)
206:)
160:,
86:,
1102:(
1032:?
1017:(
973:/
947:(
895:(
872:(
827:(
805:(
790:(
771:(
756:(
732:(
692:(
674:(
646:(
622:(
583:(
534:(
427:(
388:(
351:(
311:(
266:(
245:.
217:(
202:(
184:)
179:·
174:(
128:.
42::
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.