205:; manual searching may be performed using a roughly similar procedure, though this will often be done unconsciously. Other advantages are that one can easily find the first or last elements on the list (most likely to be useful in the case of numerically sorted data), or elements in a given range (useful again in the case of numerical data, and also with alphabetically ordered data when one may be sure of only the first few letters of the sought item or items).
39:
743:. This deviates from the standard alphabetical order, particularly due to the ordering of capital letters before all lower-case ones (and possibly the treatment of spaces and other non-letter characters). It is therefore often applied with certain alterations, the most obvious being case conversion (often to uppercase, for historical reasons) before comparison of ASCII values.
275:
order is decided. (If one string runs out of letters to compare, then it is deemed to come first; for example, "cart" comes before "carthorse".) The result of arranging a set of strings in alphabetical order is that words with the same first letter are grouped together, and within such a group words with the same first two letters are grouped together, and so on.
1078:
750:– a sequence in which the characters are assumed to come for the purpose of collation – as well as other ordering rules appropriate to the given application. This can serve to apply the correct conventions used for alphabetical ordering in the language in question, dealing properly with differently cased letters,
633:
that allows the information to be sorted in a satisfactory manner for the application in question. Often the aim will be to achieve an alphabetical or numerical ordering that follows the standard criteria as described in the preceding sections. However, not all of these criteria are easy to automate.
274:
To decide which of two strings comes first in alphabetical order, initially their first letters are compared. The string whose first letter appears earlier in the alphabet comes first in alphabetical order. If the first letters are the same, then the second letters are compared, and so on, until the
895:
In some contexts, numbers and letters are used not so much as a basis for establishing an ordering, but as a means of labeling items that are already ordered. For example, pages, sections, chapters, and the like, as well as the items of lists, are frequently "numbered" in this way. Labeling series
593:
in
Chinese and logographic systems derived from Chinese. Characters are then grouped by their primary radical, then ordered by number of pen strokes within radicals. When there is no obvious radical or more than one radical, convention governs which is used for collation. For example, the Chinese
296:
or other word dividers, the decision must be taken whether to ignore these dividers or to treat them as symbols preceding all other letters of the alphabet. For example, if the first approach is taken then "car park" will come after "carbon" and "carp" (as it would if it were written "carpark"),
597:
The radical-and-stroke system is cumbersome compared to an alphabetical system in which there are a few characters, all unambiguous. The choice of which components of a logograph comprise separate radicals and which radical is primary is not clear-cut. As a result, logographic languages often
222:
may be sorted based on the values of the numbers that they represent. For example, "−4", "2.5", "10", "89", "30,000". Pure application of this method may provide only a partial ordering on the strings, since different strings can represent the same number (as with "2" and "2.0" or, when
353:(or other non-letter characters), various approaches are possible. Sometimes such characters are treated as if they came before or after all the letters of the alphabet. Another method is for numbers to be sorted alphabetically as they would be spelled: for example
653:), with the symbols being ordered in increasing numerical order of their codes, and this ordering being extended to strings in accordance with the basic principles of alphabetical ordering (mathematically speaking,
259:. The ordering of the strings relies on the existence of a standard ordering for the letters of the alphabet in question. (The system is not limited to alphabets in the strict technical sense; languages that use a
196:
The main advantage of collation is that it makes it fast and easy for a user to find an element in the list, or to confirm that it is absent from the list. In automatic systems this can be done using a
167:
in that the classes themselves are not necessarily ordered. However, even if the order of the classes is irrelevant, the identifiers of the classes may be members of an ordered set, allowing a
431:
In several languages the rules have changed over time, and so older dictionaries may use a different order than modern ones. Furthermore, collation may depend on use. For example, German
193:
and deciding which should come before the other. When an order has been defined in this way, a sorting algorithm can be used to put a list of any number of items into that order.
860:
Sometimes, it is desired to order text with embedded numbers using proper numerical order. For example, "Figure 7b" goes before "Figure 11a", even though '7' comes after '1' in
816:. This can be adapted to use the appropriate collation sequence for a given language by tailoring its default collation table. Several such tailorings are collected in
904:(I, II, III, ... or i, ii, iii, ...), or letters (A, B, C, ... or a, b, c, ...). (An alternative method for indicating list items, without numbering them, is to use a
868:. This behavior is not particularly difficult to produce as long as only integers are to be sorted, although it can slow down sorting significantly. For example,
848:. In this case two sets of strings can be stored, one for display purposes, and another for collation purposes. Strings used for collation in this way are called
333:
comes first. For example, Juan
Hernandes and Brian O'Leary should be sorted as "Hernandes, Juan" and "O'Leary, Brian" even if they are not written this way.
887:
is also used as a separator, for example "Section 3.2.5". There is no universal answer for how to sort such strings; any rules are application dependent.
629:
When information is stored in digital systems, collation may become an automated process. It is then necessary to implement an appropriate collation
367:
as if spelled "vingt-quatre..." (French for "twenty-four"). When numerals or other symbols are used as special graphical forms of letters, as in
457:
1183:
589:, whose thousands of symbols defy ordering by convention. In this system, common components of characters are identified; these are called
1171:
1131:
598:
supplement radical-and-stroke ordering with alphabetic sorting of a phonetic conversion of the logographs. For example, the kanji word
281:
are typically treated as equivalent to their corresponding lowercase letters. (For alternative treatments in computerized systems, see
314:) are often ordered as if they were written out as "Saint". There is also a traditional convention in English that surnames beginning
122:
423:, although they are now alphabetized as two-letter combinations. A list of such conventions for various languages can be found at
1104:
308:
Abbreviations may be treated as if they were spelt out in full. For example, names containing "St." (short for the
English word
828:
In some applications, the strings by which items are collated may differ from the identifiers that are displayed. For example,
163:
746:
In many collation algorithms, the comparison is based not on the numerical codes of the characters, but with reference to the
60:
817:
251:
is the basis for many systems of collation where items of information are identified by strings consisting principally of
297:
whereas in the second approach "car park" will come before those two words. The first rule is used in many (but not all)
103:
1143:
1179:
813:
590:
186:
75:
56:
20:
769:
Problems are nonetheless still common when the algorithm has to encompass more than one language. For example, in
49:
82:
970:
562:
1162:
594:
character 妈 (meaning "mother") is sorted as a six-stroke character under the three-stroke primary radical 女.
766:
article. Such algorithms are potentially quite complex, possibly requiring several passes through the text.
198:
89:
147:
879:
Sorting decimals properly is a bit more difficult, because different locales use different symbols for a
329:
Strings that represent personal names will often be listed by alphabetical order of surname, even if the
182:
on the set of items of information (items with the same identifier are not placed in any defined order).
1156:
755:
654:
405:
288:
Certain limitations, complications, and special conventions may apply when alphabetical order is used:
71:
618:
614:
202:
138:
is the assembly of written information into a standard order. Many systems of collation are based on
27:
990:
985:
436:
342:
302:
224:
19:
This article is about collation in library, information, and computer science. For other uses, see
1198:
1060:
995:
980:
960:
915:, there are certain language-specific conventions as to which letters are used. For example, the
763:
739:
558:
424:
248:
243:
143:
305:(so that Wilson, Jim K appears with other people named Wilson, Jim and not after Wilson, Jimbo).
1168:
1128:
1052:
880:
869:
770:
621:
is a convention in some official documents where people's names are listed without hierarchy.
586:
381:
362:
293:
271:, can use the same ordering principle provided there is a set ordering for the symbols used.)
268:
252:
168:
1165:: Charts demonstrating language-specific sorting orders in various operating systems and DBMS
637:
The simplest kind of automated collation is based on the numerical codes of the symbols in a
916:
786:
578:
393:
190:
146:, or extensions and combinations thereof. Collation is a fundamental element of most office
1175:
1147:
1135:
948:
897:
751:
526:
389:
350:
151:
901:
278:
179:
155:
96:
808:
A standard algorithm for collating any collection of strings composed of any standard
1192:
905:
865:
638:
234:
or other items that can be ordered chronologically or in some other natural fashion.
231:
26:"sortkey" redirects here. For Knowledge (XXG)'s usage of sortkeys in categories, see
944:
884:
538:
462:
178:
on a set of possible identifiers, called sort keys, which consequently produces a
1122:
912:
355:
175:
38:
602:(東京) can be sorted as if it were spelled out in the Japanese characters of the
432:
330:
298:
1056:
1140:
1045:
Bulletin of the School of
Oriental and African Studies, University of London
1015:
928:
873:
630:
452:
260:
1014:
Historically, computers only handled text in uppercase (this dates back to
1178:: An online demonstration of sorting in different languages that uses the
646:
603:
256:
610:-u" (とうきょう), using the conventional sorting order for these characters.
975:
861:
833:
809:
650:
264:
1064:
1040:
448:
219:
139:
1152:
1041:"Review of A Dictionary of Modern Written Arabic (Arabic-English)"
965:
642:
582:
574:
310:
1141:
Collation of the names of the member states of the United
Nations
940:
936:
932:
924:
920:
758:, particular abbreviations, and so on, as mentioned above under
530:
372:
359:
would be sorted as if spelled out "seventeen seventy-six", and
534:
415:
were formerly (until 1994) treated as basic letters following
32:
927:(which in writing are only used for modifying the preceding
189:
defines an order through the process of comparing two given
230:
A similar approach may be taken with strings representing
174:
Formally speaking, a collation method typically defines a
911:
When letters of an alphabet are used for this purpose of
943:, are omitted. Also in many languages that use extended
425:
Alphabetical order § Language-specific conventions
340:
in
English, are often ignored for sorting purposes. So
613:
In addition, Chinese characters can also be sorted by
573:, used for non-alphabetic writing systems such as the
844:
above), but it may still be desired to display it as
16:
Assembly of written information into a standard order
657:). So a computer program might treat the characters
346:
would be sorted as just "Shining" or "Shining, The".
385:, they may be sorted as if they were those letters.
63:. Unsourced material may be challenged and removed.
525:'fate,' or 'written'), are agglomerated under the
388:Languages have different conventions for treating
392:and certain letter combinations. For example, in
729:would be sorted before strings with lower-case
883:, and sometimes the same character used as a
322:are listed as if those prefixes were written
8:
542:
520:
510:
500:
490:
480:
470:
360:
1153:Typographical collation for many languages
717:= 100). Therefore, strings beginning with
1109:, Richard F. Walters, Digital Press, 1997
1100:
1098:
123:Learn how and when to remove this message
282:
1031:
1007:
841:
759:
400:is treated as a basic letter following
1079:"Hans Wehr Arabic-English Dictionary"
458:A Dictionary of Modern Written Arabic
7:
1184:International Components for Unicode
1155:, as proposed in the List module of
1106:M Programming: A Comprehensive Guide
61:adding citations to reliable sources
951:are often not used in enumeration.
697:(the corresponding ASCII codes are
543:
521:
511:
501:
491:
481:
471:
336:Very common initial words, such as
896:that may be used include ordinary
185:A collation algorithm such as the
14:
461:, group and sort Arabic words by
349:When some of the strings contain
1125:: Unicode Technical Standard #10
737:, etc. This is sometimes called
37:
171:to arrange the items by class.
48:needs additional citations for
797:as different letters, placing
1:
818:Common Locale Data Repository
569:Another form of collation is
547:), which denotes 'writing'.
227:is used, "2e3" and "2000").
1180:Unicode Collation Algorithm
1123:Unicode Collation Algorithm
814:Unicode Collation Algorithm
214:Numerical and chronological
187:Unicode collation algorithm
1215:
1039:Abu-Haidar, J. A. (1983).
864:. This can be extended to
571:radical-and-stroke sorting
551:Radical-and-stroke sorting
439:use different approaches.
241:
25:
21:Collation (disambiguation)
18:
891:Labeling of ordered items
465:. For example, the words
1146:August 30, 2005, at the
971:Chinese character orders
655:lexicographical ordering
563:Chinese character orders
872:does this when sorting
762:, and in detail in the
619:surname stroke ordering
199:binary search algorithm
161:Collation differs from
1157:Cascading Style Sheets
773:dictionaries the word
645:coding (or any of its
606:syllabary as "to-u-ki-
451:dictionaries, such as
361:
437:telephone directories
303:telephone directories
292:When strings contain
218:Strings representing
1129:Collation in Spanish
931:), and usually also
617:. In Greater China,
615:stroke-based sorting
379:for the movie title
203:interpolation search
57:improve this article
1169:ICU Locale Explorer
991:Unicode equivalence
986:Mac and Mc together
856:Issues with numbers
789:dictionaries treat
283:Automated collation
225:scientific notation
1174:2008-05-11 at the
1134:2006-08-13 at the
996:Natural sort order
981:Taxonomic sequence
966:Asciibetical order
961:Alphabetical order
842:Alphabetical order
764:Alphabetical order
760:Alphabetical order
748:collating sequence
740:ASCIIbetical order
559:Chinese characters
249:Alphabetical order
244:Alphabetical order
144:alphabetical order
870:Microsoft Windows
677:as being ordered
363:24 heures du Mans
191:character strings
169:sorting algorithm
133:
132:
125:
107:
1206:
1163:Collation Charts
1110:
1102:
1093:
1092:
1090:
1089:
1075:
1069:
1068:
1036:
1019:
1012:
949:modified letters
900:(1, 2, 3, ...),
752:modified letters
546:
545:
524:
523:
514:
513:
504:
503:
494:
493:
484:
483:
474:
473:
390:modified letters
366:
301:, the second in
152:library catalogs
128:
121:
117:
114:
108:
106:
65:
41:
33:
1214:
1213:
1209:
1208:
1207:
1205:
1204:
1203:
1189:
1188:
1176:Wayback Machine
1148:Wayback Machine
1136:Wayback Machine
1119:
1114:
1113:
1103:
1096:
1087:
1085:
1077:
1076:
1072:
1038:
1037:
1033:
1028:
1023:
1022:
1013:
1009:
1004:
957:
898:Arabic numerals
893:
858:
826:
812:symbols is the
627:
609:
553:
445:
279:Capital letters
246:
240:
216:
211:
156:reference books
140:numerical order
129:
118:
112:
109:
66:
64:
54:
42:
31:
24:
17:
12:
11:
5:
1212:
1210:
1202:
1201:
1191:
1190:
1187:
1186:
1166:
1160:
1150:
1138:
1126:
1118:
1117:External links
1115:
1112:
1111:
1094:
1070:
1051:(2): 351–353.
1030:
1029:
1027:
1024:
1021:
1020:
1006:
1005:
1003:
1000:
999:
998:
993:
988:
983:
978:
973:
968:
963:
956:
953:
902:Roman numerals
892:
889:
866:Roman numerals
857:
854:
825:
822:
777:comes between
626:
623:
607:
567:
566:
552:
549:
444:
441:
429:
428:
386:
347:
334:
327:
306:
267:, for example
242:Main article:
239:
236:
215:
212:
210:
207:
180:total preorder
164:classification
148:filing systems
131:
130:
45:
43:
36:
15:
13:
10:
9:
6:
4:
3:
2:
1211:
1200:
1197:
1196:
1194:
1185:
1181:
1177:
1173:
1170:
1167:
1164:
1161:
1158:
1154:
1151:
1149:
1145:
1142:
1139:
1137:
1133:
1130:
1127:
1124:
1121:
1120:
1116:
1108:
1107:
1101:
1099:
1095:
1084:
1080:
1074:
1071:
1066:
1062:
1058:
1054:
1050:
1046:
1042:
1035:
1032:
1025:
1018:conventions).
1017:
1011:
1008:
1001:
997:
994:
992:
989:
987:
984:
982:
979:
977:
974:
972:
969:
967:
964:
962:
959:
958:
954:
952:
950:
946:
942:
938:
934:
930:
926:
922:
918:
914:
909:
907:
906:bulleted list
903:
899:
890:
888:
886:
885:decimal point
882:
881:decimal point
877:
875:
871:
867:
863:
855:
853:
851:
847:
843:
839:
835:
831:
823:
821:
819:
815:
811:
806:
804:
800:
796:
792:
788:
784:
780:
776:
772:
767:
765:
761:
757:
753:
749:
744:
742:
741:
736:
732:
728:
724:
720:
716:
712:
708:
704:
700:
696:
692:
688:
684:
680:
676:
672:
668:
664:
660:
656:
652:
648:
644:
640:
639:character set
635:
632:
624:
622:
620:
616:
611:
605:
601:
595:
592:
588:
584:
580:
576:
572:
565:
564:
560:
555:
554:
550:
548:
540:
536:
532:
528:
518:
508:
498:
488:
478:
468:
464:
460:
459:
455:'s bilingual
454:
450:
442:
440:
438:
434:
426:
422:
418:
414:
410:
407:
403:
399:
395:
391:
387:
384:
383:
378:
374:
370:
365:
364:
358:
357:
352:
348:
345:
344:
339:
335:
332:
328:
325:
321:
317:
313:
312:
307:
304:
300:
295:
291:
290:
289:
286:
284:
280:
276:
272:
270:
266:
262:
258:
254:
250:
245:
237:
235:
233:
228:
226:
221:
213:
208:
206:
204:
200:
194:
192:
188:
183:
181:
177:
172:
170:
166:
165:
159:
157:
153:
149:
145:
141:
137:
127:
124:
116:
105:
102:
98:
95:
91:
88:
84:
81:
77:
74: –
73:
69:
68:Find sources:
62:
58:
52:
51:
46:This article
44:
40:
35:
34:
29:
22:
1105:
1086:. Retrieved
1082:
1073:
1048:
1044:
1034:
1010:
945:Latin script
910:
894:
878:
859:
849:
845:
838:Shining, The
837:
829:
827:
807:
802:
798:
794:
790:
783:olfaktorisch
782:
778:
774:
768:
747:
745:
738:
734:
730:
726:
722:
718:
714:
710:
706:
702:
698:
694:
690:
686:
682:
678:
674:
670:
666:
662:
658:
636:
628:
612:
599:
596:
570:
568:
556:
516:
506:
505:'library'),
496:
486:
476:
475:'writing'),
466:
463:semitic root
456:
446:
443:Root sorting
433:dictionaries
430:
420:
416:
412:
408:
401:
397:
380:
376:
368:
354:
341:
337:
323:
319:
315:
309:
299:dictionaries
287:
277:
273:
247:
238:Alphabetical
229:
217:
195:
184:
173:
162:
160:
135:
134:
119:
110:
100:
93:
86:
79:
67:
55:Please help
50:verification
47:
913:enumeration
846:The Shining
830:The Shining
515:'office'),
495:'writer'),
396:the letter
343:The Shining
176:total order
72:"Collation"
1088:2023-06-04
1083:ejtaal.net
1026:References
874:file names
775:ökonomisch
713:= 67, and
641:, such as
625:Automation
527:triliteral
404:, and the
331:given name
285:, below.)
113:March 2019
83:newspapers
28:WP:SORTKEY
1199:Collation
1057:0041-977X
1016:telegraph
929:consonant
850:sort keys
832:might be
824:Sort keys
647:supersets
631:algorithm
557:See also
485:'book'),
453:Hans Wehr
261:syllabary
136:Collation
1193:Category
1172:Archived
1144:Archived
1132:Archived
955:See also
919:letters
785:, while
779:offenbar
756:digraphs
649:such as
604:hiragana
591:radicals
587:Japanese
581:and the
406:digraphs
351:numerals
269:Cherokee
257:alphabet
255:from an
209:Ordering
976:Sorting
917:Russian
862:Unicode
810:Unicode
801:before
787:Turkish
651:Unicode
579:Chinese
497:maktaba
394:Spanish
265:abugida
253:letters
220:numbers
97:scholar
1065:615409
1063:
1055:
947:, the
939:, and
834:sorted
771:German
709:= 98,
705:= 97,
701:= 36,
673:, and
517:maktūb
507:maktab
467:kitāba
449:Arabic
294:spaces
154:, and
99:
92:
85:
78:
70:
1182:with
1061:JSTOR
1002:Notes
840:(see
725:, or
643:ASCII
600:Tōkyō
583:kanji
575:hanzi
544:ك ت ب
529:root
522:مكتوب
502:مكتبة
487:kātib
477:kitāb
472:كتابة
447:Some
382:Seven
377:Se7en
311:Saint
232:dates
104:JSTOR
90:books
1053:ISSN
923:and
803:öbür
799:oyun
793:and
781:and
561:and
512:مكتب
492:كاتب
482:كتاب
435:and
419:and
411:and
373:leet
371:for
369:1337
356:1776
318:and
76:news
908:.)
836:as
585:of
577:of
375:or
338:The
324:Mac
263:or
201:or
142:or
59:by
1195::
1097:^
1081:.
1059:.
1049:46
1047:.
1043:.
935:,
876:.
852:.
820:.
805:.
754:,
733:,
721:,
699:$
693:,
689:,
685:,
681:,
679:$
675:$
669:,
665:,
661:,
608:yo
413:ll
409:ch
320:M'
316:Mc
158:.
150:,
1159:.
1091:.
1067:.
941:Ё
937:Й
933:Ы
925:Ь
921:Ъ
795:ö
791:o
735:b
731:a
727:Z
723:M
719:C
715:d
711:C
707:b
703:a
695:d
691:b
687:a
683:C
671:d
667:C
663:b
659:a
541:(
539:b
537:-
535:t
533:-
531:k
519:(
509:(
499:(
489:(
479:(
469:(
427:.
421:l
417:c
402:n
398:ñ
326:.
126:)
120:(
115:)
111:(
101:·
94:·
87:·
80:·
53:.
30:.
23:.
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.