216:; manual searching may be performed using a roughly similar procedure, though this will often be done unconsciously. Other advantages are that one can easily find the first or last elements on the list (most likely to be useful in the case of numerically sorted data), or elements in a given range (useful again in the case of numerical data, and also with alphabetically ordered data when one may be sure of only the first few letters of the sought item or items).
50:
754:. This deviates from the standard alphabetical order, particularly due to the ordering of capital letters before all lower-case ones (and possibly the treatment of spaces and other non-letter characters). It is therefore often applied with certain alterations, the most obvious being case conversion (often to uppercase, for historical reasons) before comparison of ASCII values.
286:
order is decided. (If one string runs out of letters to compare, then it is deemed to come first; for example, "cart" comes before "carthorse".) The result of arranging a set of strings in alphabetical order is that words with the same first letter are grouped together, and within such a group words with the same first two letters are grouped together, and so on.
1089:
761:– a sequence in which the characters are assumed to come for the purpose of collation – as well as other ordering rules appropriate to the given application. This can serve to apply the correct conventions used for alphabetical ordering in the language in question, dealing properly with differently cased letters,
644:
that allows the information to be sorted in a satisfactory manner for the application in question. Often the aim will be to achieve an alphabetical or numerical ordering that follows the standard criteria as described in the preceding sections. However, not all of these criteria are easy to automate.
285:
To decide which of two strings comes first in alphabetical order, initially their first letters are compared. The string whose first letter appears earlier in the alphabet comes first in alphabetical order. If the first letters are the same, then the second letters are compared, and so on, until the
906:
In some contexts, numbers and letters are used not so much as a basis for establishing an ordering, but as a means of labeling items that are already ordered. For example, pages, sections, chapters, and the like, as well as the items of lists, are frequently "numbered" in this way. Labeling series
604:
in
Chinese and logographic systems derived from Chinese. Characters are then grouped by their primary radical, then ordered by number of pen strokes within radicals. When there is no obvious radical or more than one radical, convention governs which is used for collation. For example, the Chinese
307:
or other word dividers, the decision must be taken whether to ignore these dividers or to treat them as symbols preceding all other letters of the alphabet. For example, if the first approach is taken then "car park" will come after "carbon" and "carp" (as it would if it were written "carpark"),
608:
The radical-and-stroke system is cumbersome compared to an alphabetical system in which there are a few characters, all unambiguous. The choice of which components of a logograph comprise separate radicals and which radical is primary is not clear-cut. As a result, logographic languages often
233:
may be sorted based on the values of the numbers that they represent. For example, "−4", "2.5", "10", "89", "30,000". Pure application of this method may provide only a partial ordering on the strings, since different strings can represent the same number (as with "2" and "2.0" or, when
364:(or other non-letter characters), various approaches are possible. Sometimes such characters are treated as if they came before or after all the letters of the alphabet. Another method is for numbers to be sorted alphabetically as they would be spelled: for example
664:), with the symbols being ordered in increasing numerical order of their codes, and this ordering being extended to strings in accordance with the basic principles of alphabetical ordering (mathematically speaking,
270:. The ordering of the strings relies on the existence of a standard ordering for the letters of the alphabet in question. (The system is not limited to alphabets in the strict technical sense; languages that use a
207:
The main advantage of collation is that it makes it fast and easy for a user to find an element in the list, or to confirm that it is absent from the list. In automatic systems this can be done using a
178:
in that the classes themselves are not necessarily ordered. However, even if the order of the classes is irrelevant, the identifiers of the classes may be members of an ordered set, allowing a
442:
In several languages the rules have changed over time, and so older dictionaries may use a different order than modern ones. Furthermore, collation may depend on use. For example, German
204:
and deciding which should come before the other. When an order has been defined in this way, a sorting algorithm can be used to put a list of any number of items into that order.
871:
Sometimes, it is desired to order text with embedded numbers using proper numerical order. For example, "Figure 7b" goes before "Figure 11a", even though '7' comes after '1' in
827:. This can be adapted to use the appropriate collation sequence for a given language by tailoring its default collation table. Several such tailorings are collected in
915:(I, II, III, ... or i, ii, iii, ...), or letters (A, B, C, ... or a, b, c, ...). (An alternative method for indicating list items, without numbering them, is to use a
879:. This behavior is not particularly difficult to produce as long as only integers are to be sorted, although it can slow down sorting significantly. For example,
859:. In this case two sets of strings can be stored, one for display purposes, and another for collation purposes. Strings used for collation in this way are called
344:
comes first. For example, Juan
Hernandes and Brian O'Leary should be sorted as "Hernandes, Juan" and "O'Leary, Brian" even if they are not written this way.
898:
is also used as a separator, for example "Section 3.2.5". There is no universal answer for how to sort such strings; any rules are application dependent.
640:
When information is stored in digital systems, collation may become an automated process. It is then necessary to implement an appropriate collation
378:
as if spelled "vingt-quatre..." (French for "twenty-four"). When numerals or other symbols are used as special graphical forms of letters, as in
468:
1194:
600:, whose thousands of symbols defy ordering by convention. In this system, common components of characters are identified; these are called
1182:
1142:
609:
supplement radical-and-stroke ordering with alphabetic sorting of a phonetic conversion of the logographs. For example, the kanji word
292:
are typically treated as equivalent to their corresponding lowercase letters. (For alternative treatments in computerized systems, see
325:) are often ordered as if they were written out as "Saint". There is also a traditional convention in English that surnames beginning
133:
434:, although they are now alphabetized as two-letter combinations. A list of such conventions for various languages can be found at
1115:
319:
Abbreviations may be treated as if they were spelt out in full. For example, names containing "St." (short for the
English word
839:
In some applications, the strings by which items are collated may differ from the identifiers that are displayed. For example,
174:
757:
In many collation algorithms, the comparison is based not on the numerical codes of the characters, but with reference to the
71:
828:
262:
is the basis for many systems of collation where items of information are identified by strings consisting principally of
308:
whereas in the second approach "car park" will come before those two words. The first rule is used in many (but not all)
114:
1154:
1190:
824:
601:
197:
86:
67:
31:
780:
Problems are nonetheless still common when the algorithm has to encompass more than one language. For example, in
93:
981:
573:
1173:
605:
character 妈 (meaning "mother") is sorted as a six-stroke character under the three-stroke primary radical 女.
777:
article. Such algorithms are potentially quite complex, possibly requiring several passes through the text.
209:
60:
100:
158:
890:
Sorting decimals properly is a bit more difficult, because different locales use different symbols for a
340:
Strings that represent personal names will often be listed by alphabetical order of surname, even if the
193:
on the set of items of information (items with the same identifier are not placed in any defined order).
1167:
766:
665:
416:
299:
Certain limitations, complications, and special conventions may apply when alphabetical order is used:
82:
629:
625:
213:
149:
is the assembly of written information into a standard order. Many systems of collation are based on
1001:
996:
447:
353:
313:
235:
30:
This article is about collation in library, information, and computer science. For other uses, see
1209:
1071:
1006:
991:
971:
926:, there are certain language-specific conventions as to which letters are used. For example, the
774:
750:
569:
435:
259:
254:
154:
316:(so that Wilson, Jim K appears with other people named Wilson, Jim and not after Wilson, Jimbo).
1179:
1139:
1063:
891:
880:
781:
632:
is a convention in some official documents where people's names are listed without hierarchy.
597:
392:
373:
304:
282:, can use the same ordering principle provided there is a set ordering for the symbols used.)
279:
263:
179:
38:
1176:: Charts demonstrating language-specific sorting orders in various operating systems and DBMS
648:
The simplest kind of automated collation is based on the numerical codes of the symbols in a
927:
797:
589:
404:
201:
157:, or extensions and combinations thereof. Collation is a fundamental element of most office
1186:
1158:
1146:
959:
908:
762:
537:
400:
361:
162:
912:
289:
190:
166:
107:
819:
A standard algorithm for collating any collection of strings composed of any standard
1203:
916:
876:
649:
245:
or other items that can be ordered chronologically or in some other natural fashion.
242:
955:
895:
549:
473:
189:
on a set of possible identifiers, called sort keys, which consequently produces a
1133:
923:
366:
186:
49:
613:(東京) can be sorted as if it were spelled out in the Japanese characters of the
37:"sortkey" redirects here. For Knowledge's usage of sortkeys in categories, see
443:
341:
309:
1067:
1151:
1056:
Bulletin of the School of
Oriental and African Studies, University of London
1026:
939:
884:
641:
463:
271:
1025:
Historically, computers only handled text in uppercase (this dates back to
1189:: An online demonstration of sorting in different languages that uses the
657:
614:
267:
621:-u" (とうきょう), using the conventional sorting order for these characters.
17:
986:
872:
844:
820:
661:
275:
1075:
1051:
459:
230:
150:
1163:
1052:"Review of A Dictionary of Modern Written Arabic (Arabic-English)"
976:
653:
593:
585:
321:
1152:
Collation of the names of the member states of the United
Nations
951:
947:
943:
935:
931:
769:, particular abbreviations, and so on, as mentioned above under
541:
383:
370:
would be sorted as if spelled out "seventeen seventy-six", and
545:
426:
were formerly (until 1994) treated as basic letters following
43:
938:(which in writing are only used for modifying the preceding
200:
defines an order through the process of comparing two given
241:
A similar approach may be taken with strings representing
185:
Formally speaking, a collation method typically defines a
922:
When letters of an alphabet are used for this purpose of
954:, are omitted. Also in many languages that use extended
436:
Alphabetical order § Language-specific conventions
351:
in
English, are often ignored for sorting purposes. So
624:
In addition, Chinese characters can also be sorted by
584:, used for non-alphabetic writing systems such as the
855:
above), but it may still be desired to display it as
27:
Assembly of written information into a standard order
668:). So a computer program might treat the characters
357:
would be sorted as just "Shining" or "Shining, The".
396:, they may be sorted as if they were those letters.
74:. Unsourced material may be challenged and removed.
536:'fate,' or 'written'), are agglomerated under the
399:Languages have different conventions for treating
403:and certain letter combinations. For example, in
740:would be sorted before strings with lower-case
894:, and sometimes the same character used as a
333:are listed as if those prefixes were written
8:
553:
531:
521:
511:
501:
491:
481:
371:
1164:Typographical collation for many languages
728:= 100). Therefore, strings beginning with
1120:, Richard F. Walters, Digital Press, 1997
1111:
1109:
134:Learn how and when to remove this message
293:
1042:
1018:
852:
770:
411:is treated as a basic letter following
1090:"Hans Wehr Arabic-English Dictionary"
469:A Dictionary of Modern Written Arabic
7:
1195:International Components for Unicode
1166:, as proposed in the List module of
1117:M Programming: A Comprehensive Guide
72:adding citations to reliable sources
962:are often not used in enumeration.
708:(the corresponding ASCII codes are
554:
532:
522:
512:
502:
492:
482:
347:Very common initial words, such as
907:that may be used include ordinary
196:A collation algorithm such as the
25:
472:, group and sort Arabic words by
360:When some of the strings contain
1136:: Unicode Technical Standard #10
748:, etc. This is sometimes called
48:
182:to arrange the items by class.
59:needs additional citations for
808:as different letters, placing
1:
829:Common Locale Data Repository
580:Another form of collation is
558:), which denotes 'writing'.
238:is used, "2e3" and "2000").
1191:Unicode Collation Algorithm
1134:Unicode Collation Algorithm
825:Unicode Collation Algorithm
225:Numerical and chronological
198:Unicode collation algorithm
1226:
1050:Abu-Haidar, J. A. (1983).
875:. This can be extended to
582:radical-and-stroke sorting
562:Radical-and-stroke sorting
450:use different approaches.
252:
36:
32:Collation (disambiguation)
29:
902:Labeling of ordered items
476:. For example, the words
1157:August 30, 2005, at the
982:Chinese character orders
666:lexicographical ordering
574:Chinese character orders
883:does this when sorting
773:, and in detail in the
630:surname stroke ordering
210:binary search algorithm
172:Collation differs from
1168:Cascading Style Sheets
784:dictionaries the word
656:coding (or any of its
617:syllabary as "to-u-ki-
462:dictionaries, such as
372:
448:telephone directories
314:telephone directories
303:When strings contain
229:Strings representing
1140:Collation in Spanish
942:), and usually also
628:. In Greater China,
626:stroke-based sorting
390:for the movie title
214:interpolation search
68:improve this article
1180:ICU Locale Explorer
1002:Unicode equivalence
997:Mac and Mc together
867:Issues with numbers
800:dictionaries treat
294:Automated collation
236:scientific notation
1185:2008-05-11 at the
1145:2006-08-13 at the
1007:Natural sort order
992:Taxonomic sequence
977:Asciibetical order
972:Alphabetical order
853:Alphabetical order
775:Alphabetical order
771:Alphabetical order
759:collating sequence
751:ASCIIbetical order
570:Chinese characters
260:Alphabetical order
255:Alphabetical order
155:alphabetical order
881:Microsoft Windows
688:as being ordered
374:24 heures du Mans
202:character strings
180:sorting algorithm
144:
143:
136:
118:
16:(Redirected from
1217:
1174:Collation Charts
1121:
1113:
1104:
1103:
1101:
1100:
1086:
1080:
1079:
1047:
1030:
1023:
960:modified letters
911:(1, 2, 3, ...),
763:modified letters
557:
556:
535:
534:
525:
524:
515:
514:
505:
504:
495:
494:
485:
484:
401:modified letters
377:
312:, the second in
163:library catalogs
139:
132:
128:
125:
119:
117:
76:
52:
44:
21:
1225:
1224:
1220:
1219:
1218:
1216:
1215:
1214:
1200:
1199:
1187:Wayback Machine
1159:Wayback Machine
1147:Wayback Machine
1130:
1125:
1124:
1114:
1107:
1098:
1096:
1088:
1087:
1083:
1049:
1048:
1044:
1039:
1034:
1033:
1024:
1020:
1015:
968:
909:Arabic numerals
904:
869:
837:
823:symbols is the
638:
620:
564:
456:
290:Capital letters
257:
251:
227:
222:
167:reference books
151:numerical order
140:
129:
123:
120:
77:
75:
65:
53:
42:
35:
28:
23:
22:
15:
12:
11:
5:
1223:
1221:
1213:
1212:
1202:
1201:
1198:
1197:
1177:
1171:
1161:
1149:
1137:
1129:
1128:External links
1126:
1123:
1122:
1105:
1081:
1062:(2): 351–353.
1041:
1040:
1038:
1035:
1032:
1031:
1017:
1016:
1014:
1011:
1010:
1009:
1004:
999:
994:
989:
984:
979:
974:
967:
964:
913:Roman numerals
903:
900:
877:Roman numerals
868:
865:
836:
833:
788:comes between
637:
634:
618:
578:
577:
563:
560:
455:
452:
440:
439:
397:
358:
345:
338:
317:
278:, for example
253:Main article:
250:
247:
226:
223:
221:
218:
191:total preorder
175:classification
159:filing systems
142:
141:
56:
54:
47:
26:
24:
14:
13:
10:
9:
6:
4:
3:
2:
1222:
1211:
1208:
1207:
1205:
1196:
1192:
1188:
1184:
1181:
1178:
1175:
1172:
1169:
1165:
1162:
1160:
1156:
1153:
1150:
1148:
1144:
1141:
1138:
1135:
1132:
1131:
1127:
1119:
1118:
1112:
1110:
1106:
1095:
1091:
1085:
1082:
1077:
1073:
1069:
1065:
1061:
1057:
1053:
1046:
1043:
1036:
1029:conventions).
1028:
1022:
1019:
1012:
1008:
1005:
1003:
1000:
998:
995:
993:
990:
988:
985:
983:
980:
978:
975:
973:
970:
969:
965:
963:
961:
957:
953:
949:
945:
941:
937:
933:
929:
925:
920:
918:
917:bulleted list
914:
910:
901:
899:
897:
896:decimal point
893:
892:decimal point
888:
886:
882:
878:
874:
866:
864:
862:
858:
854:
850:
846:
842:
834:
832:
830:
826:
822:
817:
815:
811:
807:
803:
799:
795:
791:
787:
783:
778:
776:
772:
768:
764:
760:
755:
753:
752:
747:
743:
739:
735:
731:
727:
723:
719:
715:
711:
707:
703:
699:
695:
691:
687:
683:
679:
675:
671:
667:
663:
659:
655:
651:
650:character set
646:
643:
635:
633:
631:
627:
622:
616:
612:
606:
603:
599:
595:
591:
587:
583:
576:
575:
571:
566:
565:
561:
559:
551:
547:
543:
539:
529:
519:
509:
499:
489:
479:
475:
471:
470:
466:'s bilingual
465:
461:
453:
451:
449:
445:
437:
433:
429:
425:
421:
418:
414:
410:
406:
402:
398:
395:
394:
389:
385:
381:
376:
375:
369:
368:
363:
359:
356:
355:
350:
346:
343:
339:
336:
332:
328:
324:
323:
318:
315:
311:
306:
302:
301:
300:
297:
295:
291:
287:
283:
281:
277:
273:
269:
265:
261:
256:
248:
246:
244:
239:
237:
232:
224:
219:
217:
215:
211:
205:
203:
199:
194:
192:
188:
183:
181:
177:
176:
170:
168:
164:
160:
156:
152:
148:
138:
135:
127:
116:
113:
109:
106:
102:
99:
95:
92:
88:
85: –
84:
80:
79:Find sources:
73:
69:
63:
62:
57:This article
55:
51:
46:
45:
40:
33:
19:
1116:
1097:. Retrieved
1093:
1084:
1059:
1055:
1045:
1021:
956:Latin script
921:
905:
889:
870:
860:
856:
849:Shining, The
848:
840:
838:
818:
813:
809:
805:
801:
794:olfaktorisch
793:
789:
785:
779:
758:
756:
749:
745:
741:
737:
733:
729:
725:
721:
717:
713:
709:
705:
701:
697:
693:
689:
685:
681:
677:
673:
669:
647:
639:
623:
610:
607:
581:
579:
567:
527:
517:
516:'library'),
507:
497:
487:
486:'writing'),
477:
474:semitic root
467:
457:
454:Root sorting
444:dictionaries
441:
431:
427:
423:
419:
412:
408:
391:
387:
379:
365:
352:
348:
334:
330:
326:
320:
310:dictionaries
298:
288:
284:
258:
249:Alphabetical
240:
228:
206:
195:
184:
173:
171:
146:
145:
130:
121:
111:
104:
97:
90:
78:
66:Please help
61:verification
58:
924:enumeration
857:The Shining
841:The Shining
526:'office'),
506:'writer'),
407:the letter
354:The Shining
187:total order
83:"Collation"
1099:2023-06-04
1094:ejtaal.net
1037:References
885:file names
786:ökonomisch
724:= 67, and
652:, such as
636:Automation
538:triliteral
415:, and the
342:given name
296:, below.)
124:March 2019
94:newspapers
39:WP:SORTKEY
1210:Collation
1068:0041-977X
1027:telegraph
940:consonant
861:sort keys
843:might be
835:Sort keys
658:supersets
642:algorithm
568:See also
496:'book'),
464:Hans Wehr
272:syllabary
147:Collation
1204:Category
1183:Archived
1155:Archived
1143:Archived
966:See also
930:letters
796:, while
790:offenbar
767:digraphs
660:such as
615:hiragana
602:radicals
598:Japanese
592:and the
417:digraphs
362:numerals
280:Cherokee
268:alphabet
266:from an
220:Ordering
18:Collated
987:Sorting
928:Russian
873:Unicode
821:Unicode
812:before
798:Turkish
662:Unicode
590:Chinese
508:maktaba
405:Spanish
276:abugida
264:letters
231:numbers
108:scholar
1076:615409
1074:
1066:
958:, the
950:, and
845:sorted
782:German
720:= 98,
716:= 97,
712:= 36,
684:, and
528:maktūb
518:maktab
478:kitāba
460:Arabic
305:spaces
165:, and
110:
103:
96:
89:
81:
1193:with
1072:JSTOR
1013:Notes
851:(see
736:, or
654:ASCII
611:Tōkyō
594:kanji
586:hanzi
555:ك ت ب
540:root
533:مكتوب
513:مكتبة
498:kātib
488:kitāb
483:كتابة
458:Some
393:Seven
388:Se7en
322:Saint
243:dates
115:JSTOR
101:books
1064:ISSN
934:and
814:öbür
810:oyun
804:and
792:and
572:and
523:مكتب
503:كاتب
493:كتاب
446:and
430:and
422:and
384:leet
382:for
380:1337
367:1776
329:and
87:news
919:.)
847:as
596:of
588:of
386:or
349:The
335:Mac
274:or
212:or
153:or
70:by
1206::
1108:^
1092:.
1070:.
1060:46
1058:.
1054:.
946:,
887:.
863:.
831:.
816:.
765:,
744:,
732:,
710:$
704:,
700:,
696:,
692:,
690:$
686:$
680:,
676:,
672:,
619:yo
424:ll
420:ch
331:M'
327:Mc
169:.
161:,
1170:.
1102:.
1078:.
952:Ё
948:Й
944:Ы
936:Ь
932:Ъ
806:ö
802:o
746:b
742:a
738:Z
734:M
730:C
726:d
722:C
718:b
714:a
706:d
702:b
698:a
694:C
682:d
678:C
674:b
670:a
552:(
550:b
548:-
546:t
544:-
542:k
530:(
520:(
510:(
500:(
490:(
480:(
438:.
432:l
428:c
413:n
409:ñ
337:.
137:)
131:(
126:)
122:(
112:·
105:·
98:·
91:·
64:.
41:.
34:.
20:)
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.