Knowledge (XXG)

Collation

Source 📝

216:; manual searching may be performed using a roughly similar procedure, though this will often be done unconsciously. Other advantages are that one can easily find the first or last elements on the list (most likely to be useful in the case of numerically sorted data), or elements in a given range (useful again in the case of numerical data, and also with alphabetically ordered data when one may be sure of only the first few letters of the sought item or items). 50: 754:. This deviates from the standard alphabetical order, particularly due to the ordering of capital letters before all lower-case ones (and possibly the treatment of spaces and other non-letter characters). It is therefore often applied with certain alterations, the most obvious being case conversion (often to uppercase, for historical reasons) before comparison of ASCII values. 286:
order is decided. (If one string runs out of letters to compare, then it is deemed to come first; for example, "cart" comes before "carthorse".) The result of arranging a set of strings in alphabetical order is that words with the same first letter are grouped together, and within such a group words with the same first two letters are grouped together, and so on.
1089: 761:– a sequence in which the characters are assumed to come for the purpose of collation – as well as other ordering rules appropriate to the given application. This can serve to apply the correct conventions used for alphabetical ordering in the language in question, dealing properly with differently cased letters, 644:
that allows the information to be sorted in a satisfactory manner for the application in question. Often the aim will be to achieve an alphabetical or numerical ordering that follows the standard criteria as described in the preceding sections. However, not all of these criteria are easy to automate.
285:
To decide which of two strings comes first in alphabetical order, initially their first letters are compared. The string whose first letter appears earlier in the alphabet comes first in alphabetical order. If the first letters are the same, then the second letters are compared, and so on, until the
906:
In some contexts, numbers and letters are used not so much as a basis for establishing an ordering, but as a means of labeling items that are already ordered. For example, pages, sections, chapters, and the like, as well as the items of lists, are frequently "numbered" in this way. Labeling series
604:
in Chinese and logographic systems derived from Chinese. Characters are then grouped by their primary radical, then ordered by number of pen strokes within radicals. When there is no obvious radical or more than one radical, convention governs which is used for collation. For example, the Chinese
307:
or other word dividers, the decision must be taken whether to ignore these dividers or to treat them as symbols preceding all other letters of the alphabet. For example, if the first approach is taken then "car park" will come after "carbon" and "carp" (as it would if it were written "carpark"),
608:
The radical-and-stroke system is cumbersome compared to an alphabetical system in which there are a few characters, all unambiguous. The choice of which components of a logograph comprise separate radicals and which radical is primary is not clear-cut. As a result, logographic languages often
233:
may be sorted based on the values of the numbers that they represent. For example, "−4", "2.5", "10", "89", "30,000". Pure application of this method may provide only a partial ordering on the strings, since different strings can represent the same number (as with "2" and "2.0" or, when
364:(or other non-letter characters), various approaches are possible. Sometimes such characters are treated as if they came before or after all the letters of the alphabet. Another method is for numbers to be sorted alphabetically as they would be spelled: for example 664:), with the symbols being ordered in increasing numerical order of their codes, and this ordering being extended to strings in accordance with the basic principles of alphabetical ordering (mathematically speaking, 270:. The ordering of the strings relies on the existence of a standard ordering for the letters of the alphabet in question. (The system is not limited to alphabets in the strict technical sense; languages that use a 207:
The main advantage of collation is that it makes it fast and easy for a user to find an element in the list, or to confirm that it is absent from the list. In automatic systems this can be done using a
178:
in that the classes themselves are not necessarily ordered. However, even if the order of the classes is irrelevant, the identifiers of the classes may be members of an ordered set, allowing a
442:
In several languages the rules have changed over time, and so older dictionaries may use a different order than modern ones. Furthermore, collation may depend on use. For example, German
204:
and deciding which should come before the other. When an order has been defined in this way, a sorting algorithm can be used to put a list of any number of items into that order.
871:
Sometimes, it is desired to order text with embedded numbers using proper numerical order. For example, "Figure 7b" goes before "Figure 11a", even though '7' comes after '1' in
827:. This can be adapted to use the appropriate collation sequence for a given language by tailoring its default collation table. Several such tailorings are collected in 915:(I, II, III, ... or i, ii, iii, ...), or letters (A, B, C, ... or a, b, c, ...). (An alternative method for indicating list items, without numbering them, is to use a 879:. This behavior is not particularly difficult to produce as long as only integers are to be sorted, although it can slow down sorting significantly. For example, 859:. In this case two sets of strings can be stored, one for display purposes, and another for collation purposes. Strings used for collation in this way are called 344:
comes first. For example, Juan Hernandes and Brian O'Leary should be sorted as "Hernandes, Juan" and "O'Leary, Brian" even if they are not written this way.
898:
is also used as a separator, for example "Section 3.2.5". There is no universal answer for how to sort such strings; any rules are application dependent.
640:
When information is stored in digital systems, collation may become an automated process. It is then necessary to implement an appropriate collation
378:
as if spelled "vingt-quatre..." (French for "twenty-four"). When numerals or other symbols are used as special graphical forms of letters, as in
468: 1194: 600:, whose thousands of symbols defy ordering by convention. In this system, common components of characters are identified; these are called 1182: 1142: 609:
supplement radical-and-stroke ordering with alphabetic sorting of a phonetic conversion of the logographs. For example, the kanji word
292:
are typically treated as equivalent to their corresponding lowercase letters. (For alternative treatments in computerized systems, see
325:) are often ordered as if they were written out as "Saint". There is also a traditional convention in English that surnames beginning 133: 434:, although they are now alphabetized as two-letter combinations. A list of such conventions for various languages can be found at 1115: 319:
Abbreviations may be treated as if they were spelt out in full. For example, names containing "St." (short for the English word
839:
In some applications, the strings by which items are collated may differ from the identifiers that are displayed. For example,
174: 757:
In many collation algorithms, the comparison is based not on the numerical codes of the characters, but with reference to the
71: 828: 262:
is the basis for many systems of collation where items of information are identified by strings consisting principally of
308:
whereas in the second approach "car park" will come before those two words. The first rule is used in many (but not all)
114: 1154: 1190: 824: 601: 197: 86: 67: 31: 780:
Problems are nonetheless still common when the algorithm has to encompass more than one language. For example, in
60: 93: 981: 573: 1173: 605:
character 妈 (meaning "mother") is sorted as a six-stroke character under the three-stroke primary radical 女.
777:
article. Such algorithms are potentially quite complex, possibly requiring several passes through the text.
209: 100: 158: 890:
Sorting decimals properly is a bit more difficult, because different locales use different symbols for a
340:
Strings that represent personal names will often be listed by alphabetical order of surname, even if the
193:
on the set of items of information (items with the same identifier are not placed in any defined order).
1167: 766: 665: 416: 299:
Certain limitations, complications, and special conventions may apply when alphabetical order is used:
82: 629: 625: 213: 149:
is the assembly of written information into a standard order. Many systems of collation are based on
38: 1001: 996: 447: 353: 313: 235: 30:
This article is about collation in library, information, and computer science. For other uses, see
1209: 1071: 1006: 991: 971: 926:, there are certain language-specific conventions as to which letters are used. For example, the 774: 750: 569: 435: 259: 254: 154: 316:(so that Wilson, Jim K appears with other people named Wilson, Jim and not after Wilson, Jimbo). 1179: 1139: 1063: 891: 880: 781: 632:
is a convention in some official documents where people's names are listed without hierarchy.
597: 392: 373: 304: 282:, can use the same ordering principle provided there is a set ordering for the symbols used.) 279: 263: 179: 1176:: Charts demonstrating language-specific sorting orders in various operating systems and DBMS 648:
The simplest kind of automated collation is based on the numerical codes of the symbols in a
927: 797: 589: 404: 201: 157:, or extensions and combinations thereof. Collation is a fundamental element of most office 1186: 1158: 1146: 959: 908: 762: 537: 400: 361: 162: 912: 289: 190: 166: 107: 819:
A standard algorithm for collating any collection of strings composed of any standard
1203: 916: 876: 649: 245:
or other items that can be ordered chronologically or in some other natural fashion.
242: 37:"sortkey" redirects here. For Knowledge (XXG)'s usage of sortkeys in categories, see 955: 895: 549: 473: 189:
on a set of possible identifiers, called sort keys, which consequently produces a
1133: 923: 366: 186: 49: 613:(東京) can be sorted as if it were spelled out in the Japanese characters of the 443: 341: 309: 1067: 1151: 1056:
Bulletin of the School of Oriental and African Studies, University of London
1026: 939: 884: 641: 463: 271: 1025:
Historically, computers only handled text in uppercase (this dates back to
1189:: An online demonstration of sorting in different languages that uses the 657: 614: 267: 621:-u" (とうきょう), using the conventional sorting order for these characters. 986: 872: 844: 820: 661: 275: 17: 1075: 1051: 459: 230: 150: 1163: 1052:"Review of A Dictionary of Modern Written Arabic (Arabic-English)" 976: 653: 593: 585: 321: 1152:
Collation of the names of the member states of the United Nations
951: 947: 943: 935: 931: 769:, particular abbreviations, and so on, as mentioned above under 541: 383: 370:
would be sorted as if spelled out "seventeen seventy-six", and
545: 426:
were formerly (until 1994) treated as basic letters following
43: 938:(which in writing are only used for modifying the preceding 200:
defines an order through the process of comparing two given
241:
A similar approach may be taken with strings representing
185:
Formally speaking, a collation method typically defines a
922:
When letters of an alphabet are used for this purpose of
954:, are omitted. Also in many languages that use extended 436:
Alphabetical order § Language-specific conventions
351:
in English, are often ignored for sorting purposes. So
624:
In addition, Chinese characters can also be sorted by
584:, used for non-alphabetic writing systems such as the 855:
above), but it may still be desired to display it as
27:
Assembly of written information into a standard order
668:). So a computer program might treat the characters 357:
would be sorted as just "Shining" or "Shining, The".
396:, they may be sorted as if they were those letters. 74:. Unsourced material may be challenged and removed. 536:'fate,' or 'written'), are agglomerated under the 399:Languages have different conventions for treating 403:and certain letter combinations. For example, in 740:would be sorted before strings with lower-case 894:, and sometimes the same character used as a 333:are listed as if those prefixes were written 8: 553: 531: 521: 511: 501: 491: 481: 371: 1164:Typographical collation for many languages 728:= 100). Therefore, strings beginning with 1120:, Richard F. Walters, Digital Press, 1997 1111: 1109: 134:Learn how and when to remove this message 293: 1042: 1018: 852: 770: 411:is treated as a basic letter following 1090:"Hans Wehr Arabic-English Dictionary" 469:A Dictionary of Modern Written Arabic 7: 1195:International Components for Unicode 1166:, as proposed in the List module of 1117:M Programming: A Comprehensive Guide 72:adding citations to reliable sources 962:are often not used in enumeration. 708:(the corresponding ASCII codes are 554: 532: 522: 512: 502: 492: 482: 347:Very common initial words, such as 907:that may be used include ordinary 196:A collation algorithm such as the 25: 472:, group and sort Arabic words by 360:When some of the strings contain 1136:: Unicode Technical Standard #10 748:, etc. This is sometimes called 48: 182:to arrange the items by class. 59:needs additional citations for 808:as different letters, placing 1: 829:Common Locale Data Repository 580:Another form of collation is 558:), which denotes 'writing'. 238:is used, "2e3" and "2000"). 1191:Unicode Collation Algorithm 1134:Unicode Collation Algorithm 825:Unicode Collation Algorithm 225:Numerical and chronological 198:Unicode collation algorithm 1226: 1050:Abu-Haidar, J. A. (1983). 875:. This can be extended to 582:radical-and-stroke sorting 562:Radical-and-stroke sorting 450:use different approaches. 252: 36: 32:Collation (disambiguation) 29: 902:Labeling of ordered items 476:. For example, the words 1157:August 30, 2005, at the 982:Chinese character orders 666:lexicographical ordering 574:Chinese character orders 883:does this when sorting 773:, and in detail in the 630:surname stroke ordering 210:binary search algorithm 172:Collation differs from 1168:Cascading Style Sheets 784:dictionaries the word 656:coding (or any of its 617:syllabary as "to-u-ki- 462:dictionaries, such as 372: 448:telephone directories 314:telephone directories 303:When strings contain 229:Strings representing 1140:Collation in Spanish 942:), and usually also 628:. In Greater China, 626:stroke-based sorting 390:for the movie title 214:interpolation search 68:improve this article 1180:ICU Locale Explorer 1002:Unicode equivalence 997:Mac and Mc together 867:Issues with numbers 800:dictionaries treat 294:Automated collation 236:scientific notation 1185:2008-05-11 at the 1145:2006-08-13 at the 1007:Natural sort order 992:Taxonomic sequence 977:Asciibetical order 972:Alphabetical order 853:Alphabetical order 775:Alphabetical order 771:Alphabetical order 759:collating sequence 751:ASCIIbetical order 570:Chinese characters 260:Alphabetical order 255:Alphabetical order 155:alphabetical order 881:Microsoft Windows 688:as being ordered 374:24 heures du Mans 202:character strings 180:sorting algorithm 144: 143: 136: 118: 16:(Redirected from 1217: 1174:Collation Charts 1121: 1113: 1104: 1103: 1101: 1100: 1086: 1080: 1079: 1047: 1030: 1023: 960:modified letters 911:(1, 2, 3, ...), 763:modified letters 557: 556: 535: 534: 525: 524: 515: 514: 505: 504: 495: 494: 485: 484: 401:modified letters 377: 312:, the second in 163:library catalogs 139: 132: 128: 125: 119: 117: 76: 52: 44: 21: 1225: 1224: 1220: 1219: 1218: 1216: 1215: 1214: 1200: 1199: 1187:Wayback Machine 1159:Wayback Machine 1147:Wayback Machine 1130: 1125: 1124: 1114: 1107: 1098: 1096: 1088: 1087: 1083: 1049: 1048: 1044: 1039: 1034: 1033: 1024: 1020: 1015: 968: 909:Arabic numerals 904: 869: 837: 823:symbols is the 638: 620: 564: 456: 290:Capital letters 257: 251: 227: 222: 167:reference books 151:numerical order 140: 129: 123: 120: 77: 75: 65: 53: 42: 35: 28: 23: 22: 15: 12: 11: 5: 1223: 1221: 1213: 1212: 1202: 1201: 1198: 1197: 1177: 1171: 1161: 1149: 1137: 1129: 1128:External links 1126: 1123: 1122: 1105: 1081: 1062:(2): 351–353. 1041: 1040: 1038: 1035: 1032: 1031: 1017: 1016: 1014: 1011: 1010: 1009: 1004: 999: 994: 989: 984: 979: 974: 967: 964: 913:Roman numerals 903: 900: 877:Roman numerals 868: 865: 836: 833: 788:comes between 637: 634: 618: 578: 577: 563: 560: 455: 452: 440: 439: 397: 358: 345: 338: 317: 278:, for example 253:Main article: 250: 247: 226: 223: 221: 218: 191:total preorder 175:classification 159:filing systems 142: 141: 56: 54: 47: 26: 24: 14: 13: 10: 9: 6: 4: 3: 2: 1222: 1211: 1208: 1207: 1205: 1196: 1192: 1188: 1184: 1181: 1178: 1175: 1172: 1169: 1165: 1162: 1160: 1156: 1153: 1150: 1148: 1144: 1141: 1138: 1135: 1132: 1131: 1127: 1119: 1118: 1112: 1110: 1106: 1095: 1091: 1085: 1082: 1077: 1073: 1069: 1065: 1061: 1057: 1053: 1046: 1043: 1036: 1029:conventions). 1028: 1022: 1019: 1012: 1008: 1005: 1003: 1000: 998: 995: 993: 990: 988: 985: 983: 980: 978: 975: 973: 970: 969: 965: 963: 961: 957: 953: 949: 945: 941: 937: 933: 929: 925: 920: 918: 917:bulleted list 914: 910: 901: 899: 897: 896:decimal point 893: 892:decimal point 888: 886: 882: 878: 874: 866: 864: 862: 858: 854: 850: 846: 842: 834: 832: 830: 826: 822: 817: 815: 811: 807: 803: 799: 795: 791: 787: 783: 778: 776: 772: 768: 764: 760: 755: 753: 752: 747: 743: 739: 735: 731: 727: 723: 719: 715: 711: 707: 703: 699: 695: 691: 687: 683: 679: 675: 671: 667: 663: 659: 655: 651: 650:character set 646: 643: 635: 633: 631: 627: 622: 616: 612: 606: 603: 599: 595: 591: 587: 583: 576: 575: 571: 566: 565: 561: 559: 551: 547: 543: 539: 529: 519: 509: 499: 489: 479: 475: 471: 470: 466:'s bilingual 465: 461: 453: 451: 449: 445: 437: 433: 429: 425: 421: 418: 414: 410: 406: 402: 398: 395: 394: 389: 385: 381: 376: 375: 369: 368: 363: 359: 356: 355: 350: 346: 343: 339: 336: 332: 328: 324: 323: 318: 315: 311: 306: 302: 301: 300: 297: 295: 291: 287: 283: 281: 277: 273: 269: 265: 261: 256: 248: 246: 244: 239: 237: 232: 224: 219: 217: 215: 211: 205: 203: 199: 194: 192: 188: 183: 181: 177: 176: 170: 168: 164: 160: 156: 152: 148: 138: 135: 127: 116: 113: 109: 106: 102: 99: 95: 92: 88: 85: –  84: 80: 79:Find sources: 73: 69: 63: 62: 57:This article 55: 51: 46: 45: 40: 33: 19: 1116: 1097:. Retrieved 1093: 1084: 1059: 1055: 1045: 1021: 956:Latin script 921: 905: 889: 870: 860: 856: 849:Shining, The 848: 840: 838: 818: 813: 809: 805: 801: 794:olfaktorisch 793: 789: 785: 779: 758: 756: 749: 745: 741: 737: 733: 729: 725: 721: 717: 713: 709: 705: 701: 697: 693: 689: 685: 681: 677: 673: 669: 647: 639: 623: 610: 607: 581: 579: 567: 527: 517: 516:'library'), 507: 497: 487: 486:'writing'), 477: 474:semitic root 467: 457: 454:Root sorting 444:dictionaries 441: 431: 427: 423: 419: 412: 408: 391: 387: 379: 365: 352: 348: 334: 330: 326: 320: 310:dictionaries 298: 288: 284: 258: 249:Alphabetical 240: 228: 206: 195: 184: 173: 171: 146: 145: 130: 121: 111: 104: 97: 90: 78: 66:Please help 61:verification 58: 924:enumeration 857:The Shining 841:The Shining 526:'office'), 506:'writer'), 407:the letter 354:The Shining 187:total order 83:"Collation" 1099:2023-06-04 1094:ejtaal.net 1037:References 885:file names 786:ökonomisch 724:= 67, and 652:, such as 636:Automation 538:triliteral 415:, and the 342:given name 296:, below.) 124:March 2019 94:newspapers 39:WP:SORTKEY 1210:Collation 1068:0041-977X 1027:telegraph 940:consonant 861:sort keys 843:might be 835:Sort keys 658:supersets 642:algorithm 568:See also 496:'book'), 464:Hans Wehr 272:syllabary 147:Collation 1204:Category 1183:Archived 1155:Archived 1143:Archived 966:See also 930:letters 796:, while 790:offenbar 767:digraphs 660:such as 615:hiragana 602:radicals 598:Japanese 592:and the 417:digraphs 362:numerals 280:Cherokee 268:alphabet 266:from an 220:Ordering 987:Sorting 928:Russian 873:Unicode 821:Unicode 812:before 798:Turkish 662:Unicode 590:Chinese 508:maktaba 405:Spanish 276:abugida 264:letters 231:numbers 108:scholar 18:Sortkey 1076:615409 1074:  1066:  958:, the 950:, and 845:sorted 782:German 720:= 98, 716:= 97, 712:= 36, 684:, and 528:maktūb 518:maktab 478:kitāba 460:Arabic 305:spaces 165:, and 110:  103:  96:  89:  81:  1193:with 1072:JSTOR 1013:Notes 851:(see 736:, or 654:ASCII 611:Tōkyō 594:kanji 586:hanzi 555:ك ت ب 540:root 533:مكتوب 513:مكتبة 498:kātib 488:kitāb 483:كتابة 458:Some 393:Seven 388:Se7en 322:Saint 243:dates 115:JSTOR 101:books 1064:ISSN 934:and 814:öbür 810:oyun 804:and 792:and 572:and 523:مكتب 503:كاتب 493:كتاب 446:and 430:and 422:and 384:leet 382:for 380:1337 367:1776 329:and 87:news 919:.) 847:as 596:of 588:of 386:or 349:The 335:Mac 274:or 212:or 153:or 70:by 1206:: 1108:^ 1092:. 1070:. 1060:46 1058:. 1054:. 946:, 887:. 863:. 831:. 816:. 765:, 744:, 732:, 710:$ 704:, 700:, 696:, 692:, 690:$ 686:$ 680:, 676:, 672:, 619:yo 424:ll 420:ch 331:M' 327:Mc 169:. 161:, 1170:. 1102:. 1078:. 952:Ё 948:Й 944:Ы 936:Ь 932:Ъ 806:ö 802:o 746:b 742:a 738:Z 734:M 730:C 726:d 722:C 718:b 714:a 706:d 702:b 698:a 694:C 682:d 678:C 674:b 670:a 552:( 550:b 548:- 546:t 544:- 542:k 530:( 520:( 510:( 500:( 490:( 480:( 438:. 432:l 428:c 413:n 409:ñ 337:. 137:) 131:( 126:) 122:( 112:· 105:· 98:· 91:· 64:. 41:. 34:. 20:)

Index

Sortkey
Collation (disambiguation)
WP:SORTKEY

verification
improve this article
adding citations to reliable sources
"Collation"
news
newspapers
books
scholar
JSTOR
Learn how and when to remove this message
numerical order
alphabetical order
filing systems
library catalogs
reference books
classification
sorting algorithm
total order
total preorder
Unicode collation algorithm
character strings
binary search algorithm
interpolation search
numbers
scientific notation
dates

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.