Knowledge (XXG)

Collation

Source 📝

205:; manual searching may be performed using a roughly similar procedure, though this will often be done unconsciously. Other advantages are that one can easily find the first or last elements on the list (most likely to be useful in the case of numerically sorted data), or elements in a given range (useful again in the case of numerical data, and also with alphabetically ordered data when one may be sure of only the first few letters of the sought item or items). 39: 743:. This deviates from the standard alphabetical order, particularly due to the ordering of capital letters before all lower-case ones (and possibly the treatment of spaces and other non-letter characters). It is therefore often applied with certain alterations, the most obvious being case conversion (often to uppercase, for historical reasons) before comparison of ASCII values. 275:
order is decided. (If one string runs out of letters to compare, then it is deemed to come first; for example, "cart" comes before "carthorse".) The result of arranging a set of strings in alphabetical order is that words with the same first letter are grouped together, and within such a group words with the same first two letters are grouped together, and so on.
1078: 750:– a sequence in which the characters are assumed to come for the purpose of collation – as well as other ordering rules appropriate to the given application. This can serve to apply the correct conventions used for alphabetical ordering in the language in question, dealing properly with differently cased letters, 633:
that allows the information to be sorted in a satisfactory manner for the application in question. Often the aim will be to achieve an alphabetical or numerical ordering that follows the standard criteria as described in the preceding sections. However, not all of these criteria are easy to automate.
274:
To decide which of two strings comes first in alphabetical order, initially their first letters are compared. The string whose first letter appears earlier in the alphabet comes first in alphabetical order. If the first letters are the same, then the second letters are compared, and so on, until the
895:
In some contexts, numbers and letters are used not so much as a basis for establishing an ordering, but as a means of labeling items that are already ordered. For example, pages, sections, chapters, and the like, as well as the items of lists, are frequently "numbered" in this way. Labeling series
593:
in Chinese and logographic systems derived from Chinese. Characters are then grouped by their primary radical, then ordered by number of pen strokes within radicals. When there is no obvious radical or more than one radical, convention governs which is used for collation. For example, the Chinese
296:
or other word dividers, the decision must be taken whether to ignore these dividers or to treat them as symbols preceding all other letters of the alphabet. For example, if the first approach is taken then "car park" will come after "carbon" and "carp" (as it would if it were written "carpark"),
597:
The radical-and-stroke system is cumbersome compared to an alphabetical system in which there are a few characters, all unambiguous. The choice of which components of a logograph comprise separate radicals and which radical is primary is not clear-cut. As a result, logographic languages often
222:
may be sorted based on the values of the numbers that they represent. For example, "−4", "2.5", "10", "89", "30,000". Pure application of this method may provide only a partial ordering on the strings, since different strings can represent the same number (as with "2" and "2.0" or, when
353:(or other non-letter characters), various approaches are possible. Sometimes such characters are treated as if they came before or after all the letters of the alphabet. Another method is for numbers to be sorted alphabetically as they would be spelled: for example 653:), with the symbols being ordered in increasing numerical order of their codes, and this ordering being extended to strings in accordance with the basic principles of alphabetical ordering (mathematically speaking, 259:. The ordering of the strings relies on the existence of a standard ordering for the letters of the alphabet in question. (The system is not limited to alphabets in the strict technical sense; languages that use a 196:
The main advantage of collation is that it makes it fast and easy for a user to find an element in the list, or to confirm that it is absent from the list. In automatic systems this can be done using a
167:
in that the classes themselves are not necessarily ordered. However, even if the order of the classes is irrelevant, the identifiers of the classes may be members of an ordered set, allowing a
431:
In several languages the rules have changed over time, and so older dictionaries may use a different order than modern ones. Furthermore, collation may depend on use. For example, German
193:
and deciding which should come before the other. When an order has been defined in this way, a sorting algorithm can be used to put a list of any number of items into that order.
860:
Sometimes, it is desired to order text with embedded numbers using proper numerical order. For example, "Figure 7b" goes before "Figure 11a", even though '7' comes after '1' in
816:. This can be adapted to use the appropriate collation sequence for a given language by tailoring its default collation table. Several such tailorings are collected in 904:(I, II, III, ... or i, ii, iii, ...), or letters (A, B, C, ... or a, b, c, ...). (An alternative method for indicating list items, without numbering them, is to use a 868:. This behavior is not particularly difficult to produce as long as only integers are to be sorted, although it can slow down sorting significantly. For example, 848:. In this case two sets of strings can be stored, one for display purposes, and another for collation purposes. Strings used for collation in this way are called 333:
comes first. For example, Juan Hernandes and Brian O'Leary should be sorted as "Hernandes, Juan" and "O'Leary, Brian" even if they are not written this way.
887:
is also used as a separator, for example "Section 3.2.5". There is no universal answer for how to sort such strings; any rules are application dependent.
629:
When information is stored in digital systems, collation may become an automated process. It is then necessary to implement an appropriate collation
367:
as if spelled "vingt-quatre..." (French for "twenty-four"). When numerals or other symbols are used as special graphical forms of letters, as in
457: 1183: 589:, whose thousands of symbols defy ordering by convention. In this system, common components of characters are identified; these are called 1171: 1131: 598:
supplement radical-and-stroke ordering with alphabetic sorting of a phonetic conversion of the logographs. For example, the kanji word
281:
are typically treated as equivalent to their corresponding lowercase letters. (For alternative treatments in computerized systems, see
314:) are often ordered as if they were written out as "Saint". There is also a traditional convention in English that surnames beginning 122: 423:, although they are now alphabetized as two-letter combinations. A list of such conventions for various languages can be found at 1104: 308:
Abbreviations may be treated as if they were spelt out in full. For example, names containing "St." (short for the English word
828:
In some applications, the strings by which items are collated may differ from the identifiers that are displayed. For example,
163: 746:
In many collation algorithms, the comparison is based not on the numerical codes of the characters, but with reference to the
60: 817: 251:
is the basis for many systems of collation where items of information are identified by strings consisting principally of
297:
whereas in the second approach "car park" will come before those two words. The first rule is used in many (but not all)
103: 1143: 1179: 813: 590: 186: 75: 56: 20: 769:
Problems are nonetheless still common when the algorithm has to encompass more than one language. For example, in
49: 82: 970: 562: 1162: 594:
character 妈 (meaning "mother") is sorted as a six-stroke character under the three-stroke primary radical 女.
766:
article. Such algorithms are potentially quite complex, possibly requiring several passes through the text.
198: 89: 147: 879:
Sorting decimals properly is a bit more difficult, because different locales use different symbols for a
329:
Strings that represent personal names will often be listed by alphabetical order of surname, even if the
182:
on the set of items of information (items with the same identifier are not placed in any defined order).
1156: 755: 654: 405: 288:
Certain limitations, complications, and special conventions may apply when alphabetical order is used:
71: 618: 614: 202: 138:
is the assembly of written information into a standard order. Many systems of collation are based on
27: 990: 985: 436: 342: 302: 224: 19:
This article is about collation in library, information, and computer science. For other uses, see
1198: 1060: 995: 980: 960: 915:, there are certain language-specific conventions as to which letters are used. For example, the 763: 739: 558: 424: 248: 243: 143: 305:(so that Wilson, Jim K appears with other people named Wilson, Jim and not after Wilson, Jimbo). 1168: 1128: 1052: 880: 869: 770: 621:
is a convention in some official documents where people's names are listed without hierarchy.
586: 381: 362: 293: 271:, can use the same ordering principle provided there is a set ordering for the symbols used.) 268: 252: 168: 1165:: Charts demonstrating language-specific sorting orders in various operating systems and DBMS 637:
The simplest kind of automated collation is based on the numerical codes of the symbols in a
916: 786: 578: 393: 190: 146:, or extensions and combinations thereof. Collation is a fundamental element of most office 1175: 1147: 1135: 948: 897: 751: 526: 389: 350: 151: 901: 278: 179: 155: 96: 808:
A standard algorithm for collating any collection of strings composed of any standard
1192: 905: 865: 638: 234:
or other items that can be ordered chronologically or in some other natural fashion.
231: 26:"sortkey" redirects here. For Knowledge (XXG)'s usage of sortkeys in categories, see 944: 884: 538: 462: 178:
on a set of possible identifiers, called sort keys, which consequently produces a
1122: 912: 355: 175: 38: 602:(東京) can be sorted as if it were spelled out in the Japanese characters of the 432: 330: 298: 1056: 1140: 1045:
Bulletin of the School of Oriental and African Studies, University of London
1015: 928: 873: 630: 452: 260: 1014:
Historically, computers only handled text in uppercase (this dates back to
1178:: An online demonstration of sorting in different languages that uses the 646: 603: 256: 610:-u" (とうきょう), using the conventional sorting order for these characters. 975: 861: 833: 809: 650: 264: 1064: 1040: 448: 219: 139: 1152: 1041:"Review of A Dictionary of Modern Written Arabic (Arabic-English)" 965: 642: 582: 574: 310: 1141:
Collation of the names of the member states of the United Nations
940: 936: 932: 924: 920: 758:, particular abbreviations, and so on, as mentioned above under 530: 372: 359:
would be sorted as if spelled out "seventeen seventy-six", and
534: 415:
were formerly (until 1994) treated as basic letters following
32: 927:(which in writing are only used for modifying the preceding 189:
defines an order through the process of comparing two given
230:
A similar approach may be taken with strings representing
174:
Formally speaking, a collation method typically defines a
911:
When letters of an alphabet are used for this purpose of
943:, are omitted. Also in many languages that use extended 425:
Alphabetical order § Language-specific conventions
340:
in English, are often ignored for sorting purposes. So
613:
In addition, Chinese characters can also be sorted by
573:, used for non-alphabetic writing systems such as the 844:
above), but it may still be desired to display it as
16:
Assembly of written information into a standard order
657:). So a computer program might treat the characters 346:
would be sorted as just "Shining" or "Shining, The".
385:, they may be sorted as if they were those letters. 63:. Unsourced material may be challenged and removed. 525:'fate,' or 'written'), are agglomerated under the 388:Languages have different conventions for treating 392:and certain letter combinations. For example, in 729:would be sorted before strings with lower-case 883:, and sometimes the same character used as a 322:are listed as if those prefixes were written 8: 542: 520: 510: 500: 490: 480: 470: 360: 1153:Typographical collation for many languages 717:= 100). Therefore, strings beginning with 1109:, Richard F. Walters, Digital Press, 1997 1100: 1098: 123:Learn how and when to remove this message 282: 1031: 1007: 841: 759: 400:is treated as a basic letter following 1079:"Hans Wehr Arabic-English Dictionary" 458:A Dictionary of Modern Written Arabic 7: 1184:International Components for Unicode 1155:, as proposed in the List module of 1106:M Programming: A Comprehensive Guide 61:adding citations to reliable sources 951:are often not used in enumeration. 697:(the corresponding ASCII codes are 543: 521: 511: 501: 491: 481: 471: 336:Very common initial words, such as 896:that may be used include ordinary 185:A collation algorithm such as the 14: 461:, group and sort Arabic words by 349:When some of the strings contain 1125:: Unicode Technical Standard #10 737:, etc. This is sometimes called 37: 171:to arrange the items by class. 48:needs additional citations for 797:as different letters, placing 1: 818:Common Locale Data Repository 569:Another form of collation is 547:), which denotes 'writing'. 227:is used, "2e3" and "2000"). 1180:Unicode Collation Algorithm 1123:Unicode Collation Algorithm 814:Unicode Collation Algorithm 214:Numerical and chronological 187:Unicode collation algorithm 1215: 1039:Abu-Haidar, J. A. (1983). 864:. This can be extended to 571:radical-and-stroke sorting 551:Radical-and-stroke sorting 439:use different approaches. 241: 25: 21:Collation (disambiguation) 18: 891:Labeling of ordered items 465:. For example, the words 1146:August 30, 2005, at the 971:Chinese character orders 655:lexicographical ordering 563:Chinese character orders 872:does this when sorting 762:, and in detail in the 619:surname stroke ordering 199:binary search algorithm 161:Collation differs from 1157:Cascading Style Sheets 773:dictionaries the word 645:coding (or any of its 606:syllabary as "to-u-ki- 451:dictionaries, such as 361: 437:telephone directories 303:telephone directories 292:When strings contain 218:Strings representing 1129:Collation in Spanish 931:), and usually also 617:. In Greater China, 615:stroke-based sorting 379:for the movie title 203:interpolation search 57:improve this article 1169:ICU Locale Explorer 991:Unicode equivalence 986:Mac and Mc together 856:Issues with numbers 789:dictionaries treat 283:Automated collation 225:scientific notation 1174:2008-05-11 at the 1134:2006-08-13 at the 996:Natural sort order 981:Taxonomic sequence 966:Asciibetical order 961:Alphabetical order 842:Alphabetical order 764:Alphabetical order 760:Alphabetical order 748:collating sequence 740:ASCIIbetical order 559:Chinese characters 249:Alphabetical order 244:Alphabetical order 144:alphabetical order 870:Microsoft Windows 677:as being ordered 363:24 heures du Mans 191:character strings 169:sorting algorithm 133: 132: 125: 107: 1206: 1163:Collation Charts 1110: 1102: 1093: 1092: 1090: 1089: 1075: 1069: 1068: 1036: 1019: 1012: 949:modified letters 900:(1, 2, 3, ...), 752:modified letters 546: 545: 524: 523: 514: 513: 504: 503: 494: 493: 484: 483: 474: 473: 390:modified letters 366: 301:, the second in 152:library catalogs 128: 121: 117: 114: 108: 106: 65: 41: 33: 1214: 1213: 1209: 1208: 1207: 1205: 1204: 1203: 1189: 1188: 1176:Wayback Machine 1148:Wayback Machine 1136:Wayback Machine 1119: 1114: 1113: 1103: 1096: 1087: 1085: 1077: 1076: 1072: 1038: 1037: 1033: 1028: 1023: 1022: 1013: 1009: 1004: 957: 898:Arabic numerals 893: 858: 826: 812:symbols is the 627: 609: 553: 445: 279:Capital letters 246: 240: 216: 211: 156:reference books 140:numerical order 129: 118: 112: 109: 66: 64: 54: 42: 31: 24: 17: 12: 11: 5: 1212: 1210: 1202: 1201: 1191: 1190: 1187: 1186: 1166: 1160: 1150: 1138: 1126: 1118: 1117:External links 1115: 1112: 1111: 1094: 1070: 1051:(2): 351–353. 1030: 1029: 1027: 1024: 1021: 1020: 1006: 1005: 1003: 1000: 999: 998: 993: 988: 983: 978: 973: 968: 963: 956: 953: 902:Roman numerals 892: 889: 866:Roman numerals 857: 854: 825: 822: 777:comes between 626: 623: 607: 567: 566: 552: 549: 444: 441: 429: 428: 386: 347: 334: 327: 306: 267:, for example 242:Main article: 239: 236: 215: 212: 210: 207: 180:total preorder 164:classification 148:filing systems 131: 130: 45: 43: 36: 15: 13: 10: 9: 6: 4: 3: 2: 1211: 1200: 1197: 1196: 1194: 1185: 1181: 1177: 1173: 1170: 1167: 1164: 1161: 1158: 1154: 1151: 1149: 1145: 1142: 1139: 1137: 1133: 1130: 1127: 1124: 1121: 1120: 1116: 1108: 1107: 1101: 1099: 1095: 1084: 1080: 1074: 1071: 1066: 1062: 1058: 1054: 1050: 1046: 1042: 1035: 1032: 1025: 1018:conventions). 1017: 1011: 1008: 1001: 997: 994: 992: 989: 987: 984: 982: 979: 977: 974: 972: 969: 967: 964: 962: 959: 958: 954: 952: 950: 946: 942: 938: 934: 930: 926: 922: 918: 914: 909: 907: 906:bulleted list 903: 899: 890: 888: 886: 885:decimal point 882: 881:decimal point 877: 875: 871: 867: 863: 855: 853: 851: 847: 843: 839: 835: 831: 823: 821: 819: 815: 811: 806: 804: 800: 796: 792: 788: 784: 780: 776: 772: 767: 765: 761: 757: 753: 749: 744: 742: 741: 736: 732: 728: 724: 720: 716: 712: 708: 704: 700: 696: 692: 688: 684: 680: 676: 672: 668: 664: 660: 656: 652: 648: 644: 640: 639:character set 635: 632: 624: 622: 620: 616: 611: 605: 601: 595: 592: 588: 584: 580: 576: 572: 565: 564: 560: 555: 554: 550: 548: 540: 536: 532: 528: 518: 508: 498: 488: 478: 468: 464: 460: 459: 455:'s bilingual 454: 450: 442: 440: 438: 434: 426: 422: 418: 414: 410: 407: 403: 399: 395: 391: 387: 384: 383: 378: 374: 370: 365: 364: 358: 357: 352: 348: 345: 344: 339: 335: 332: 328: 325: 321: 317: 313: 312: 307: 304: 300: 295: 291: 290: 289: 286: 284: 280: 276: 272: 270: 266: 262: 258: 254: 250: 245: 237: 235: 233: 228: 226: 221: 213: 208: 206: 204: 200: 194: 192: 188: 183: 181: 177: 172: 170: 166: 165: 159: 157: 153: 149: 145: 141: 137: 127: 124: 116: 105: 102: 98: 95: 91: 88: 84: 81: 77: 74: –  73: 69: 68:Find sources: 62: 58: 52: 51: 46:This article 44: 40: 35: 34: 29: 22: 1105: 1086:. Retrieved 1082: 1073: 1048: 1044: 1034: 1010: 945:Latin script 910: 894: 878: 859: 849: 845: 838:Shining, The 837: 829: 827: 807: 802: 798: 794: 790: 783:olfaktorisch 782: 778: 774: 768: 747: 745: 738: 734: 730: 726: 722: 718: 714: 710: 706: 702: 698: 694: 690: 686: 682: 678: 674: 670: 666: 662: 658: 636: 628: 612: 599: 596: 570: 568: 556: 516: 506: 505:'library'), 496: 486: 476: 475:'writing'), 466: 463:semitic root 456: 446: 443:Root sorting 433:dictionaries 430: 420: 416: 412: 408: 401: 397: 380: 376: 368: 354: 341: 337: 323: 319: 315: 309: 299:dictionaries 287: 277: 273: 247: 238:Alphabetical 229: 217: 195: 184: 173: 162: 160: 135: 134: 119: 110: 100: 93: 86: 79: 67: 55:Please help 50:verification 47: 913:enumeration 846:The Shining 830:The Shining 515:'office'), 495:'writer'), 396:the letter 343:The Shining 176:total order 72:"Collation" 1088:2023-06-04 1083:ejtaal.net 1026:References 874:file names 775:ökonomisch 713:= 67, and 641:, such as 625:Automation 527:triliteral 404:, and the 331:given name 285:, below.) 113:March 2019 83:newspapers 28:WP:SORTKEY 1199:Collation 1057:0041-977X 1016:telegraph 929:consonant 850:sort keys 832:might be 824:Sort keys 647:supersets 631:algorithm 557:See also 485:'book'), 453:Hans Wehr 261:syllabary 136:Collation 1193:Category 1172:Archived 1144:Archived 1132:Archived 955:See also 919:letters 785:, while 779:offenbar 756:digraphs 649:such as 604:hiragana 591:radicals 587:Japanese 581:and the 406:digraphs 351:numerals 269:Cherokee 257:alphabet 255:from an 209:Ordering 976:Sorting 917:Russian 862:Unicode 810:Unicode 801:before 787:Turkish 651:Unicode 579:Chinese 497:maktaba 394:Spanish 265:abugida 253:letters 220:numbers 97:scholar 1065:615409 1063:  1055:  947:, the 939:, and 834:sorted 771:German 709:= 98, 705:= 97, 701:= 36, 673:, and 517:maktūb 507:maktab 467:kitāba 449:Arabic 294:spaces 154:, and 99:  92:  85:  78:  70:  1182:with 1061:JSTOR 1002:Notes 840:(see 725:, or 643:ASCII 600:Tōkyō 583:kanji 575:hanzi 544:ك ت ب 529:root 522:مكتوب 502:مكتبة 487:kātib 477:kitāb 472:كتابة 447:Some 382:Seven 377:Se7en 311:Saint 232:dates 104:JSTOR 90:books 1053:ISSN 923:and 803:öbür 799:oyun 793:and 781:and 561:and 512:مكتب 492:كاتب 482:كتاب 435:and 419:and 411:and 373:leet 371:for 369:1337 356:1776 318:and 76:news 908:.) 836:as 585:of 577:of 375:or 338:The 324:Mac 263:or 201:or 142:or 59:by 1195:: 1097:^ 1081:. 1059:. 1049:46 1047:. 1043:. 935:, 876:. 852:. 820:. 805:. 754:, 733:, 721:, 699:$ 693:, 689:, 685:, 681:, 679:$ 675:$ 669:, 665:, 661:, 608:yo 413:ll 409:ch 320:M' 316:Mc 158:. 150:, 1159:. 1091:. 1067:. 941:Ё 937:Й 933:Ы 925:Ь 921:Ъ 795:ö 791:o 735:b 731:a 727:Z 723:M 719:C 715:d 711:C 707:b 703:a 695:d 691:b 687:a 683:C 671:d 667:C 663:b 659:a 541:( 539:b 537:- 535:t 533:- 531:k 519:( 509:( 499:( 489:( 479:( 469:( 427:. 421:l 417:c 402:n 398:ñ 326:. 126:) 120:( 115:) 111:( 101:· 94:· 87:· 80:· 53:. 30:. 23:.

Index

Collation (disambiguation)
WP:SORTKEY

verification
improve this article
adding citations to reliable sources
"Collation"
news
newspapers
books
scholar
JSTOR
Learn how and when to remove this message
numerical order
alphabetical order
filing systems
library catalogs
reference books
classification
sorting algorithm
total order
total preorder
Unicode collation algorithm
character strings
binary search algorithm
interpolation search
numbers
scientific notation
dates
Alphabetical order

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.