Knowledge (XXG)

Character encoding

Source 📝

285:, as well as in associated peripherals. Since the punched card code then in use only allowed digits, upper-case English letters and a few special characters, six bits were sufficient. These BCD encodings extended existing simple four-bit numeric encoding to include alphabetic and special characters, mapping them easily to punch-card encoding which was already in widespread use. IBM's codes were used primarily with IBM equipment; other computer vendors of the era had their own character codes, often six-bit, but usually had the ability to read tapes produced on IBM equipment. These BCD encodings were the precursors of IBM's 33: 227:
ASCII committee (which contained at least one member of the Fieldata committee, W. F. Leubbert), which addressed most of the shortcomings of Fieldata, using a simpler code. Many of the changes were subtle, such as collatable character sets within certain numeric ranges. ASCII63 was a success, widely adopted by industry, and with the follow-up issue of the 1967 ASCII code (which added lower-case letters and fixed some "control code" issues) ASCII67 was adopted fairly widely. ASCII67's American-centric nature was somewhat addressed in the European
235: 1165: 4320: 598:), how those code points are mapped to a series of fixed-size natural numbers (code units), and finally how those units are encoded as a stream of octets (bytes). The purpose of this decomposition is to establish a universal set of characters that can be encoded in a variety of ways. To describe this model precisely, Unicode uses its own set of terminology to describe its process: 370:
is the set of characters that can be represented by a particular coded character set. The repertoire may be closed, meaning that no additions are allowed without creating a new standard (as is the case with ASCII and most of the ISO-8859 series); or it may be open, allowing additions (as is the case
326:
Informally, the terms "character encoding", "character map", "character set" and "code page" are often used interchangeably. Historically, the same standard would specify a repertoire of characters and how they were to be encoded into a stream of code units — usually with a single character per code
245:
invented punch card data encoding in the late 19th century to analyze census data. Initially, each hole position represented a different data element, but later, numeric information was encoded by numbering the lower rows 0 to 9, with a punch in a column representing its row number. Later alphabetic
226:
code, a six-or seven-bit code, introduced by the U.S. Army Signal Corps. While Fieldata addressed many of the then-modern issues (e.g. letter and digit codes arranged for machine collation), it fell short of its goals and was short-lived. In 1963 the first ASCII code was released (X3.4-1963) by the
296:
In trying to develop universally interchangeable character encodings, researchers in the 1980s faced the dilemma that, on the one hand, it seemed necessary to add more bits to accommodate additional characters, but on the other hand, for the users of the relatively small character set of the Latin
1082:
Note in particular that 𐐀 is represented with either one 32-bit value (UTF-32), two 16-bit values (UTF-16), or four 8-bit values (UTF-8). Although each of those forms uses the same total number of bits (32) to represent the glyph, it is not obvious how the actual numeric byte values are related.
641:
to facilitate storage in a system that represents numbers as bit sequences of fixed length (i.e. practically any computer system). For example, a system that stores numeric information in 16-bit units can only directly represent code points 0 to 65,535 in each unit, but larger code points (say,
622:(each code point represents one character). For example, in a given repertoire, the capital letter "A" in the Latin alphabet might be represented by the code point 65, the character "B" by 66, and so on. Multiple coded character sets may share the same character repertoire; for example 524:
UTF-16: code units are twice as long as 8-bit code units. Therefore, any code point with a scalar value less than U+10000 is encoded with a single code unit. Code points with a value U+10000 or higher require two code units each. These pairs of code units have a unique term in UTF-16:
314:. Code points would then be represented in a variety of ways and with various default numbers of bits per character (code units) depending on context. To encode code points higher than the length of the code unit, such as above 256 for eight-bit units, the solution was to implement 309:
was to break the assumption (dating back to telegraph codes) that each character should always directly correspond to a particular sequence of bits. Instead, characters would first be mapped to a universal intermediate representation in the form of abstract numbers called
154:
The history of character codes illustrates the evolving need for machine-mediated character-based symbolic information over a distance, using once-novel electrical means. The earliest codes were based upon manual and hand-written encoding and cyphering systems, such as
179:, introduced in the 1840s, used a system of four "symbols" (short signal, long signal, short space, long space) to generate codes of variable length. Though some commercial use of Morse code was via machinery, it was often used as a manual code, generated by hand on a 297:
alphabet (who still constituted the majority of computer users), those additional bits were a colossal waste of then-scarce and expensive computing resources (as they would always be zeroed out for such users). In 1985, the average personal computer user's
1091:
As a result of having many character encoding methods in use (and the need for backward compatibility with archived data), many computer programs have been developed to translate data between character encoding schemes, a process known as
454:
Despite no longer referring to specific page numbers in a standard, many character encodings are still referred to by their code page number; likewise, the term "code page" is often still used to refer to character encodings in general.
175:, 1869). With the adoption of electrical and electro-mechanical techniques these earliest codes were adapted to the new capabilities and limitations of the early machines. The earliest well-known electrically transmitted character code, 458:
The term "code page" is not used in Unix or Linux, where "charmap" is preferred, usually in the larger context of locales. IBM's Character Data Representation Architecture (CDRA) designates entities with coded character set identifiers
221:
has been erroneously applied to ITA2 and its many variants. ITA2 suffered from many shortcomings and was often improved by many equipment manufacturers, sometimes creating compatibility issues. In 1959 the U.S. military defined its
754:
In Unicode, a character can be referred to as 'U+' followed by its codepoint value in hexadecimal. The range of valid code points (the codespace) for the Unicode standard is U+0000 to U+10FFFF, inclusive, divided in 17
404:
size of the character encoding). For example, common code units include 7-bit, 8-bit, 16-bit, and 32-bit. In some encodings, some characters are encoded using multiple code units; such an encoding is referred to as a
556:, there are two distinct approaches that can be taken to encode them: they can be encoded either as a single unified character (known as a precomposed character), or as separate characters that combine into a single 301:
could store only about 10 megabytes, and it cost approximately US$ 250 on the wholesale market (and much higher if purchased separately at retail), so it was very important at the time to make every bit count.
2289: 649:(CES) is the mapping of code units to a sequence of octets to facilitate storage on an octet-based file system or transmission over an octet-based network. Simple character encoding schemes include 206:(ASCII) and Unicode. Unicode, a well-defined and extensible encoding system, has supplanted most earlier character encodings, but the path of code development to the present is fairly well known. 3938: 119:) which represent most of the characters used in many written languages. Character encoding using internationally accepted standards permits worldwide interchange of text in electronic form. 517:
A code point is represented by a sequence of code units. The mapping is defined by the encoding. Thus, the number of code units required to represent a code point depends on the encoding:
746:
The Unicode model uses the term "character map" for other systems which directly assign a sequence of characters to a sequence of bytes, covering all of the CCS, CEF and CES layers.
1342:
Central, Eastern and Southern European languages (Albanian, Bosnian, Croatian, Hungarian, Polish, Romanian, Serbian and Slovenian, but also French, German, Italian and Irish Gaelic)
2783: 1507: 605:(ACR) is the full set of abstract characters that a system supports. Unicode has an open repertoire, meaning that new characters will be added to the repertoire over time. 2620: 571:
variants is a choice that must be made when constructing a particular character encoding. Some writing systems, such as Arabic and Hebrew, need to accommodate things like
203: 889:). This string has several Unicode representations which are logically equivalent, yet while each is suited to a diverse set of circumstances or range of requirements: 1962: 1721:, part of the TRON project, is an encoding system that does not use Han Unification; instead, it uses "control codes" to switch between 16-bit "planes" of characters. 2728: 1497: 286: 2803: 2317: 1502: 217:
in 1870, patented in 1874, modified by Donald Murray in 1901, and standardized by CCITT as International Telegraph Alphabet No. 2 (ITA2) in 1930. The name
1181:
Especially, a comparison of the advantages and disadvantages of the few 3-5 most common character encodings (e.g. UTF-8, UTF-16 and UTF-32). You can help by
2155: 2243: 1421:
for Central European languages that use Latin script, (Polish, Czech, Slovak, Hungarian, Slovene, Serbian, Croatian, Bosnian, Romanian and Albanian)
4347: 4030: 3784: 2222: 690: 534:
GB 18030: multiple code units per code point are common, because of the small code units. Code points are mapped to one, two, or four code units.
4020: 1910: 1859: 694: 3769: 2723: 2229: 2024: 3903: 1128: 327:
unit. However, due to the emergence of more sophisticated character encodings, the distinction between these terms has become important.
164: 2191: 2173: 739:
character, particularly where there are regional variants that have been 'unified' in Unicode as the same character. An example is the
4297: 3808: 3611: 2355: 1724: 763:(BMP). This plane contains most commonly-used characters. Characters in the range U+10000 to U+10FFFF in the other planes are called 3853: 3469: 3464: 2967: 2798: 2310: 1522: 2080: 2887: 4105: 4040: 3794: 3774: 725: 115:
only. The low cost of digital representation of data in modern computer systems allows more elaborate character codes (such as
3858: 2451: 2345: 2290:
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
560:. The former simplifies the text handling system, but the latter allows any letter/diacritic combination to be used in text. 3972: 3943: 3593: 2094: 2076: 642:
65,536 to 1.4 million) could be represented by using multiple 16-bit units. This correspondence is defined by a CEF.
32: 4035: 3923: 3883: 2303: 1204: 1155: 1110: 401: 123: 1710:– a system ("glyph set") that includes over 100,000 Chinese character drawings, modern and ancient, popular and obscure 431:
in the IBM standard character set manual, which would define a particular character encoding. Other vendors, including
4342: 4277: 3888: 3804: 3789: 3693: 3606: 3578: 3544: 1682: 1677: 3554: 3549: 4251: 4196: 4117: 3898: 3893: 2902: 250:
represented date internally by the timing of pulses relative to the motion of the cards through the machine. When
3958: 3913: 3749: 3298: 3002: 2947: 2912: 1830: 860: 359:
is a character set mapped to set of unique numbers. For historical reasons, this is also often referred to as a
3473: 2982: 2962: 2957: 2897: 2892: 2401: 1574: 764: 266: 4323: 4307: 4234: 4229: 4191: 4162: 4127: 3559: 3293: 2992: 2877: 1688: 590:, together constitute a unified standard for character encoding. Rather than mapping characters directly to 587: 406: 315: 191:
use. Most codes are of fixed per-character length or variable-length sequences of fixed-length codes (e.g.
89: 43:. Presence and absence of a hole represents 1 and 0, respectively; for example, "W" is encoded as "1010111". 258:
Electronic Multiplier, it used a variety of binary encoding schemes that were tied to the punch card code.
3918: 3908: 3764: 3754: 3288: 2997: 2442: 2429: 2365: 2213: 721: 713: 613: 188: 168: 759:, identified by the numbers 0 to 16. Characters in the range U+0000 to U+FFFF are in plane 0, called the 724:
with fixed-length UCS-2BE and maps Unicode code points to variable-length sequences of 16-bit words. See
4095: 3933: 3868: 3744: 3283: 2437: 894: 544: 531:
UTF-32: the 32-bit code unit is large enough that every code point is represented as a single code unit.
333: 54: 3313: 4256: 3928: 3818: 3688: 3308: 2133: 1903: 1564: 1312:
Western Europe with rationalised character set for Nordic languages, including complete Icelandic set
1216: 561: 262: 135: 3729: 1770: 575:
that are joined in different ways in different contexts, but represent the same semantic character.
318:
where an escape sequence would signal that subsequent bits should be parsed as a higher code point.
4211: 3838: 3323: 3208: 3198: 3193: 1798: 627: 594:, Unicode separately defines a coded character set that maps characters to unique natural numbers ( 278: 74: 4292: 4140: 3953: 3948: 3873: 2872: 2846: 2370: 1713: 1615: 1551: 1528: 440: 247: 66: 716:
with fixed-length ASCII and maps Unicode code points to variable-length sequences of octets, or
1515:
is a widely deployed standard for Japanese character encoding that has several encoding forms.
1145:
MultiByteToWideChar/WideCharToMultiByte – to convert from ANSI to Unicode & Unicode to ANSI
4282: 4221: 4201: 3863: 3843: 3823: 3451: 2927: 2907: 2419: 2235: 2225: 2020: 1412: 1135: 463:), each of which is variously called a "charset", "character set", "code page", or "CHARMAP". 444: 372: 234: 156: 1928: 4239: 3813: 3779: 3489: 3318: 2118: 2114: 1729: 1667: 1220: 1131:– A set of C and Java libraries to perform charset conversion. uconv can be used from ICU4C. 397: 242: 172: 139: 100: 4287: 4206: 2937: 2932: 2922: 2867: 2552: 2542: 2537: 2532: 2527: 2522: 2517: 2284: 2011: 1718: 1339: 1333: 1327: 1321: 1315: 1309: 760: 756: 686: 682: 396:
is the minimum bit combination that can represent a character in a character encoding (in
298: 282: 443:, also published their own sets of code pages; the most well-known code page suites are " 214: 2072: 1882: 1846:
in reality, you usually just assume UTF-8 since that is by far the most common encoding.
1125:– a program that converts encoding of input and output to programs running interactively 289:(usually abbreviated as EBCDIC), an eight-bit encoding scheme developed in 1963 for the 3739: 3734: 3724: 3719: 3714: 3709: 3673: 3668: 3661: 3656: 3651: 3646: 3641: 3636: 3631: 3626: 3621: 3616: 3484: 3441: 3436: 3431: 3426: 3421: 3416: 3411: 3406: 3401: 3396: 3391: 3386: 3381: 3376: 3371: 3278: 3273: 3268: 3263: 3258: 3253: 3248: 3243: 3238: 3233: 3228: 3223: 3007: 2987: 2592: 2512: 2507: 2502: 2497: 2492: 2487: 2482: 2477: 2472: 2340: 1303: 1297: 1291: 1285: 1279: 1273: 1267: 1261: 1255: 1208: 1099: 623: 349: 345: 290: 127: 108: 1164: 689:; compressing schemes try to minimize the number of bytes used per code unit (such as 4336: 4059: 3479: 3366: 3361: 3356: 3351: 3346: 3341: 3218: 3213: 3203: 3188: 3183: 3178: 3173: 3168: 3163: 3158: 3153: 3148: 3143: 3138: 3133: 3128: 3123: 3118: 3113: 3108: 3103: 3098: 3093: 3088: 3083: 3078: 3073: 3068: 3063: 3058: 3053: 3048: 3043: 3038: 3033: 3028: 3023: 2942: 2917: 2882: 2841: 2587: 1592: 1492: 1407: 1403: 1399: 1395: 1391: 1387: 1383: 1379: 1375: 1371: 1367: 1363: 1359: 1355: 1351: 1347: 471:
The code unit size is equivalent to the bit measurement for the particular encoding:
448: 184: 180: 1904:"IBM Electronic Data-Processing Machines Type 702 Preliminary Manual of Information" 1749: 863:
of the letters "ab̲c𐐀"—that is, a string containing a Unicode combining character (
4079: 4074: 4069: 4064: 3799: 3539: 3534: 3529: 3524: 3519: 3514: 3509: 3504: 3499: 3494: 2977: 2972: 2952: 2836: 2828: 2461: 1654: 1620: 1556: 1533: 1474: 1466: 1460: 1454: 1448: 1442: 1436: 1430: 1424: 1418: 1249: 678: 62: 36: 17: 2268: 1963:"What's the difference between an Encoding, Code Page, Character Set and Unicode?" 2394: 2377: 2279: 2110: 1694: 1232: 1106: 1093: 428: 199: 112: 104: 246:
data was encoded by allowing more than one punch per column. Electromechanical
4244: 4185: 4152: 4005: 3683: 2778: 2748: 2743: 2738: 2733: 2698: 2582: 2577: 2567: 2562: 2360: 2350: 1707: 1540: 1512: 950: 618: 595: 311: 176: 96: 81: 1732:– used in some applications when character encoding metadata is not available 4132: 4110: 4015: 3828: 2857: 2788: 2768: 2763: 2688: 2683: 1609: 1546: 1518: 735:
which supplies additional information to select the particular variant of a
553: 432: 419: 360: 85: 70: 521:
UTF-8: code points map to a sequence of one, two, three or four code units.
4302: 4157: 4122: 4100: 4010: 3833: 2773: 2758: 2718: 2713: 2708: 2693: 2652: 2647: 2642: 2637: 2632: 2627: 2424: 2414: 2410: 2384: 2285:
Decimal, Hexadecimal Character Codes in HTML Unicode – Encoding converter
2102: 1860:"Ancient Computer Character Code Tables – and Why They're Still Relevant" 1700: 1672: 1580: 1276:
Western Europe and Baltic countries (Lithuania, Estonia, Latvia and Lapp)
920: 717: 705: 701: 666: 662: 658: 654: 572: 549:
Exactly what constitutes a character varies between character encodings.
491: 223: 77: 58: 51: 2050: 4172: 3968: 3878: 3759: 3333: 2703: 2678: 2668: 2406: 2269:
Character sets registered by Internet Assigned Numbers Authority (IANA)
1627: 1569: 736: 583: 274: 270: 269:) six-bit character encoding schemes, starting as early as 1953 in its 255: 192: 160: 116: 80:. The numerical values that make up a character encoding are known as " 2295: 198:
Common examples of character encoding systems include Morse code, the
4177: 4167: 4145: 4025: 4000: 3995: 3848: 3678: 3569: 3459: 2818: 2808: 2793: 2610: 1643: 1638: 1630:(and subsets thereof, such as the 16-bit 'Basic Multilingual Plane') 1483: 1479: 1244: 1224: 1113:. On Firefox 3, for example, see the View/Character Encoding submenu. 674: 670: 630:
all cover the same repertoire but map them to different code points.
526: 505: 498: 487: 436: 344:
is a collection of elements used to represent text. For example, the
228: 143: 293:
that featured a larger character set, including lower case letters.
1270:
Western Europe and South European (Turkish, Maltese plus Esperanto)
4272: 3990: 3985: 3980: 3597: 3303: 2813: 2753: 2615: 2389: 2106: 1650: 1633: 1599: 1237: 1212: 1116: 709: 650: 568: 557: 483: 476: 460: 389:
is the range of numerical values spanned by a coded character set.
233: 131: 40: 31: 3583: 2673: 2239: 1993: 1588: 1487: 1122: 1036: 708:
are simpler CESes, most systems working with Unicode use either
591: 95:
Early character codes associated with the optical or electrical
2299: 382:
is a value or position of a character in a coded character set.
4050: 2098: 1159: 740: 251: 210: 2273: 2101:
character sets, which conform to the structure and rules of
1771:"Usage Survey of Character Encodings broken down by Ranking" 1336:
Added the Euro sign and other rationalisations to ISO 8859-1
424:"Code page" is a historical name for a coded character set. 1831:"Character encoding for iOS developers. Or UTF-8 what now?" 1814:
Android note: The Android platform default is always UTF-8.
1215:, used in 98.2% of surveyed web sites, as of May 2024. In 134:, used in 98.2% of surveyed web sites, as of May 2024. In 770:
The following table shows examples of code point values:
238:
Hollerith 80-column punch card with EBCDIC character set
99:
could only represent a subset of the characters used in
1182: 167:, and the 4-digit encoding of Chinese characters for a 2280:
Unicode Technical Report #17: Character Encoding Model
2013:
The Unicode Standard Version 15.0 – Core Specification
305:
The compromise solution that was eventually found and
1119:– a program and standardized API to convert encodings 1765: 1763: 447:" (based on Windows-1252) and "IBM"/"DOS" (based on 4265: 4220: 4088: 4049: 3967: 3702: 3592: 3568: 3450: 3332: 3016: 2855: 2827: 2661: 2603: 2460: 2333: 1988: 1986: 1984: 1982: 1980: 1978: 1976: 1685:– articles related to character encoding in general 1612:
is a Korean double-byte character encoding standard
1074: 1070: 1066: 1062: 1058: 1054: 1050: 1046: 1042: 1028: 1024: 1020: 1016: 1012: 1008: 998: 994: 990: 986: 982: 972: 968: 964: 960: 956: 942: 938: 934: 930: 926: 912: 908: 904: 900: 681:, switch between several simple schemes by using a 2049:Whistler, Ken; Freytag, Asmus (11 November 2022). 337:is a minimal unit of text that has semantic value. 204:American Standard Code for Information Interchange 1691:– articles detailing specific character encodings 1306:Western Europe with amended Turkish character set 254:went to electronic processing, starting with the 4252:Unicode control, format and separator characters 1330:Celtic languages (Irish Gaelic, Scottish, Welsh) 979:Five UTF-32 code units (32-bit integer values): 2221:. The Systems Programming Series (1 ed.). 2192:"WideCharToMultiByte function (stringapiset.h)" 2174:"MultiByteToWideChar function (stringapiset.h)" 669:; compound character encoding schemes, such as 427:Originally, a code page referred to a specific 84:" and collectively comprise a "code space", a " 1883:"An annotated history of some character codes" 287:Extended Binary-Coded Decimal Interchange Code 2311: 2215:Coded Character Sets, History and Development 2073:"VT510 Video Terminal Programmer Information" 1109:– most modern web browsers feature automatic 586:and its parallel standard, the ISO/IEC 10646 8: 2005: 2003: 1956: 1954: 1952: 1950: 2113:in IBM's standard character set manual) in 39:with the word "Knowledge (XXG)" encoded in 2318: 2304: 2296: 2051:"UTR#17: Unicode Character Encoding Model" 772: 27:Using numbers to represent text characters 183:and decipherable by ear, and persists in 2109:supports a number of IBM PC code pages ( 2044: 2042: 2040: 2038: 2036: 1929:"IBM Drives Hard Disks to New Standards" 1035:Nine UTF-8 code units (8-bit values, or 1005:Six UTF-16 code units (16-bit integers) 2223:Addison-Wesley Publishing Company, Inc. 2079:(DEC). 7.1. Character Sets - Overview. 1935:. Popular Computing Inc. pp. 29–33 1741: 637:(CEF) is the mapping of code points to 57:, especially the written characters of 50:is the process of assigning numbers to 2160:Microsoft .NET Framework Class Library 2019:. Unicode Consortium. September 2022. 1543:is an extended version of JIS X 0208. 1175: with: Popularity and comparison: 876:) as well a supplementary character ( 371:with Unicode and to a limited extent 7: 2083:from the original on 26 January 2016 1916:from the original on 9 October 2022. 1824: 1822: 1793: 1791: 1591:(a more famous variant is Microsoft 1129:International Components for Unicode 165:international maritime signal flags 3662:Norwegian and Danish (alternative) 2134:"Terminology (The Java Tutorials)" 1725:Universal Character Set characters 25: 2249:from the original on May 26, 2016 1829:Galloway, Matt (9 October 2012). 1096:. Some of these are cited below. 4319: 4318: 1927:Strelho, Kevin (15 April 1985). 1163: 4106:Digital encoding of APL symbols 4041:Comparison of Unicode encodings 2559:Proposed but not approved 1909:. 1954. p. 80. 22-6173-1. 1858:Tom Henderson (17 April 2014). 1750:"Character Encoding Definition" 726:comparison of Unicode encodings 4348:Natural language and computing 2292:by Joel Spolsky (Oct 10, 2003) 2212:Mackenzie, Charles E. (1980). 1961:Shawn Steele (15 March 2005). 552:For example, for letters with 1: 2077:Digital Equipment Corporation 1881:Tom Jennings (1 March 2010). 887:DESERET CAPITAL LETTER LONG I 603:abstract character repertoire 1324:Baltic languages plus Polish 1205:most used character encoding 1156:Popularity of text encodings 1111:character encoding detection 277:computers, and in its later 124:most used character encoding 4278:Character encodings in HTML 3612:National Replacement (NRCS) 3579:Japanese language in EBCDIC 2093:In addition to traditional 1994:"Glossary of Unicode Terms" 1695:Hexadecimal representations 1683:Category:Character encoding 1678:Character encodings in HTML 1142:Encoding.Convert – .NET API 728:for a detailed discussion. 626:and IBM code pages 037 and 4364: 2010:"Chapter 3: Conformance". 1525:is a dialect of Shift_JIS) 1264:Western and Central Europe 1153: 1150:Common character encodings 831:Inverted exclamation mark 542: 527:"Unicode surrogate pairs". 417: 103:, sometimes restricted to 4316: 2156:"Encoding.Convert Method" 2121:of industry-standard PCs. 1754:The Tech Terms Dictionary 1577:(Microsoft Code page 936) 1413:MS-Windows character sets 647:character encoding scheme 316:variable-length encodings 213:encoding, was created by 4308:Variable-length encoding 4089:Miscellaneous code pages 2847:Extended Unix Code / EUC 2538:-15 (New Western Europe) 2334:Early telecommunications 2274:Characters and encodings 1178:Statistics on popularity 765:supplementary characters 761:Basic Multilingual Plane 731:Finally, there may be a 616:that maps characters to 352:are both character sets. 209:The Baudot code, a five- 4235:C0 and C1 control codes 1689:Category:Character sets 635:character encoding form 588:Universal Character Set 564:pose similar problems. 407:variable-width encoding 2483:-3 (Maltese/Esperanto) 2434:World System Teletext 1704:– character set mismap 1427:for Cyrillic alphabets 1223:tasks, both UTF-8 and 579:Unicode encoding model 567:Exactly how to handle 307:developed into Unicode 239: 169:Chinese telegraph code 142:tasks, both UTF-8 and 61:, allowing them to be 44: 4257:Whitespace characters 3934:Ventura International 1996:. Unicode Consortium. 1433:for Western languages 1227:are popular options. 733:higher-level protocol 545:Character (computing) 237: 146:are popular options. 35: 3652:Norwegian and Danish 2117:mode to emulate the 2053:. Unicode Consortium 1756:. 24 September 2010. 1463:for Baltic languages 1217:application programs 743:attribute xml:lang. 508:consists of 32 bits. 501:consists of 16 bits; 368:character repertoire 263:Binary Coded Decimal 136:application programs 4212:Unified Hangul Code 3884:PostScript Standard 3607:Multinational (MCS) 2478:-2 (Central Europe) 2473:-1 (Western Europe) 2327:Character encodings 895:composed characters 779:Unicode code point 750:Unicode code points 722:backward compatible 714:backward compatible 610:coded character set 494:consists of 8 bits; 479:consists of 7 bits; 357:coded character set 248:tabulating machines 18:Coded character set 4343:Character encoding 4293:Hardware code page 4053:typesetting system 3889:PostScript Latin 1 3545:Cyrillic + Finnish 3452:Windows code pages 3334:IBM AIX code pages 2662:National standards 2593:Ukrainian Cyrillic 2276:, by Jukka Korpela 2180:. 13 October 2021. 1835:www.galloway.me.uk 1803:Android Developers 1714:Presentation layer 874:COMBINING LOW LINE 441:Oracle Corporation 373:Windows code pages 240: 105:upper case letters 48:Character encoding 45: 4330: 4329: 4283:Charset detection 4222:Control character 3904:Sharp calculators 3775:Casio calculators 3703:Platform specific 3555:Cyrillic + German 3550:Cyrillic + French 2968:Maltese/Esperanto 2604:Bibliographic use 2488:-4 (North Europe) 2420:T.51/ISO/IEC 6937 2378:Baudot and Murray 2231:978-0-201-14460-4 2026:978-1-936213-32-0 1282:Cyrillic alphabet 1201: 1200: 852: 851: 400:terms, it is the 261:IBM used several 101:written languages 16:(Redirected from 4355: 4322: 4321: 3814:DG International 3689:Special Graphics 3490:Extended Latin-8 2888:Central European 2878:Barents Cyrillic 2583:Barents Cyrillic 2553:-12 (Devanagari) 2549:Abandoned parts 2320: 2313: 2306: 2297: 2258: 2256: 2254: 2248: 2220: 2200: 2199: 2198:. 9 August 2022. 2188: 2182: 2181: 2170: 2164: 2163: 2152: 2146: 2145: 2143: 2141: 2130: 2124: 2123: 2119:console terminal 2090: 2088: 2069: 2063: 2062: 2060: 2058: 2046: 2031: 2030: 2018: 2007: 1998: 1997: 1990: 1971: 1970: 1958: 1945: 1944: 1942: 1940: 1924: 1918: 1917: 1915: 1908: 1900: 1894: 1893: 1891: 1889: 1878: 1872: 1871: 1869: 1867: 1855: 1849: 1848: 1843: 1841: 1826: 1817: 1816: 1811: 1809: 1795: 1786: 1785: 1783: 1781: 1767: 1758: 1757: 1746: 1730:Charset sniffing 1668:Percent-encoding 1557:ISO-2022-JP-2004 1221:operating system 1194: 1191: 1167: 1160: 1076: 1072: 1068: 1064: 1060: 1056: 1052: 1048: 1044: 1030: 1026: 1022: 1018: 1014: 1010: 1000: 996: 992: 988: 984: 974: 970: 966: 962: 958: 944: 940: 936: 932: 928: 914: 910: 906: 902: 888: 885: 882: 880: 875: 872: 869: 867: 773: 687:escape sequences 398:computer science 308: 243:Herman Hollerith 173:Hans Schjellerup 140:operating system 21: 4363: 4362: 4358: 4357: 4356: 4354: 4353: 4352: 4333: 4332: 4331: 4326: 4312: 4288:Han unification 4261: 4216: 4084: 4045: 3963: 3785:Compucolor 8001 3698: 3694:Technical (TCS) 3617:French Canadian 3588: 3564: 3560:Polytonic Greek 3446: 3328: 3012: 2998:Turkic Cyrillic 2913:Font X (Kermit) 2908:Farsi (Persian) 2860: 2851: 2823: 2657: 2599: 2469:Approved parts 2456: 2329: 2324: 2265: 2252: 2250: 2246: 2232: 2218: 2211: 2208: 2206:Further reading 2203: 2190: 2189: 2185: 2172: 2171: 2167: 2154: 2153: 2149: 2139: 2137: 2132: 2131: 2127: 2086: 2084: 2071: 2070: 2066: 2056: 2054: 2048: 2047: 2034: 2027: 2016: 2009: 2008: 2001: 1992: 1991: 1974: 1960: 1959: 1948: 1938: 1936: 1926: 1925: 1921: 1913: 1906: 1902: 1901: 1897: 1887: 1885: 1880: 1879: 1875: 1865: 1863: 1857: 1856: 1852: 1839: 1837: 1828: 1827: 1820: 1807: 1805: 1797: 1796: 1789: 1779: 1777: 1769: 1768: 1761: 1748: 1747: 1743: 1739: 1664: 1659: 1197: 1189: 1186: 1173:needs expansion 1158: 1152: 1089: 886: 883: 878: 877: 873: 870: 865: 864: 857: 752: 683:byte order mark 581: 547: 541: 515: 504:A code unit in 497:A code unit in 482:A code unit in 475:A code unit in 469: 422: 416: 324: 306: 299:hard disk drive 152: 28: 23: 22: 15: 12: 11: 5: 4361: 4359: 4351: 4350: 4345: 4335: 4334: 4328: 4327: 4324:Character sets 4317: 4314: 4313: 4311: 4310: 4305: 4300: 4295: 4290: 4285: 4280: 4275: 4269: 4267: 4266:Related topics 4263: 4262: 4260: 4259: 4254: 4249: 4248: 4247: 4242: 4232: 4230:Morse prosigns 4226: 4224: 4218: 4217: 4215: 4214: 4209: 4204: 4199: 4194: 4189: 4182: 4181: 4180: 4175: 4170: 4160: 4155: 4150: 4149: 4148: 4143: 4135: 4130: 4125: 4120: 4115: 4114: 4113: 4103: 4098: 4092: 4090: 4086: 4085: 4083: 4082: 4077: 4072: 4067: 4062: 4056: 4054: 4047: 4046: 4044: 4043: 4038: 4033: 4028: 4023: 4018: 4013: 4008: 4003: 3998: 3993: 3988: 3983: 3977: 3975: 3965: 3964: 3962: 3961: 3956: 3951: 3946: 3941: 3936: 3931: 3926: 3924:TI calculators 3921: 3916: 3911: 3906: 3901: 3896: 3891: 3886: 3881: 3876: 3871: 3866: 3861: 3856: 3851: 3846: 3841: 3836: 3831: 3826: 3821: 3816: 3811: 3802: 3797: 3792: 3787: 3782: 3777: 3772: 3767: 3762: 3757: 3752: 3747: 3742: 3737: 3732: 3727: 3722: 3717: 3712: 3706: 3704: 3700: 3699: 3697: 3696: 3691: 3686: 3681: 3676: 3671: 3666: 3665: 3664: 3659: 3654: 3649: 3644: 3639: 3634: 3632:United Kingdom 3629: 3624: 3619: 3609: 3603: 3601: 3590: 3589: 3587: 3586: 3581: 3575: 3573: 3566: 3565: 3563: 3562: 3557: 3552: 3547: 3542: 3537: 3532: 3527: 3522: 3517: 3512: 3507: 3502: 3497: 3492: 3487: 3482: 3477: 3467: 3462: 3456: 3454: 3448: 3447: 3445: 3444: 3439: 3434: 3429: 3424: 3419: 3414: 3409: 3404: 3399: 3394: 3389: 3384: 3379: 3374: 3369: 3364: 3359: 3354: 3349: 3344: 3338: 3336: 3330: 3329: 3327: 3326: 3321: 3316: 3311: 3306: 3301: 3296: 3291: 3286: 3281: 3276: 3271: 3266: 3261: 3256: 3251: 3246: 3241: 3236: 3231: 3226: 3221: 3216: 3211: 3206: 3201: 3196: 3191: 3186: 3181: 3176: 3171: 3166: 3161: 3156: 3151: 3146: 3141: 3136: 3131: 3126: 3121: 3116: 3111: 3106: 3101: 3096: 3091: 3086: 3081: 3076: 3071: 3066: 3061: 3056: 3051: 3046: 3041: 3036: 3031: 3026: 3020: 3018: 3017:DOS code pages 3014: 3013: 3011: 3010: 3005: 3000: 2995: 2990: 2985: 2980: 2975: 2970: 2965: 2963:Latin (Kermit) 2960: 2955: 2950: 2945: 2940: 2935: 2930: 2925: 2920: 2915: 2910: 2905: 2900: 2895: 2890: 2885: 2880: 2875: 2870: 2864: 2862: 2853: 2852: 2850: 2849: 2844: 2839: 2833: 2831: 2825: 2824: 2822: 2821: 2816: 2811: 2806: 2801: 2796: 2791: 2786: 2781: 2776: 2771: 2766: 2761: 2756: 2751: 2746: 2741: 2736: 2731: 2726: 2721: 2716: 2711: 2706: 2701: 2696: 2691: 2686: 2681: 2676: 2671: 2665: 2663: 2659: 2658: 2656: 2655: 2650: 2645: 2640: 2635: 2630: 2625: 2624: 2623: 2618: 2607: 2605: 2601: 2600: 2598: 2597: 2596: 2595: 2590: 2585: 2580: 2572: 2571: 2570: 2565: 2563:KOI-8 Cyrillic 2557: 2556: 2555: 2547: 2546: 2545: 2543:-16 (Romanian) 2540: 2535: 2530: 2525: 2520: 2515: 2510: 2505: 2500: 2495: 2490: 2485: 2480: 2475: 2466: 2464: 2458: 2457: 2455: 2454: 2449: 2448: 2447: 2446: 2445: 2440: 2432: 2427: 2422: 2404: 2399: 2398: 2397: 2387: 2382: 2381: 2380: 2375: 2374: 2373: 2368: 2363: 2358: 2348: 2341:Telegraph code 2337: 2335: 2331: 2330: 2325: 2323: 2322: 2315: 2308: 2300: 2294: 2293: 2287: 2282: 2277: 2271: 2264: 2263:External links 2261: 2260: 2259: 2230: 2207: 2204: 2202: 2201: 2196:Microsoft Docs 2183: 2178:Microsoft Docs 2165: 2147: 2125: 2064: 2032: 2025: 1999: 1972: 1967:Microsoft Docs 1946: 1919: 1895: 1873: 1850: 1818: 1787: 1759: 1740: 1738: 1735: 1734: 1733: 1727: 1722: 1716: 1711: 1705: 1697: 1692: 1686: 1680: 1675: 1670: 1663: 1660: 1658: 1657: 1648: 1647: 1646: 1641: 1636: 1625: 1624: 1623: 1618: 1613: 1604: 1603: 1602: 1585: 1584: 1583: 1578: 1572: 1561: 1560: 1559: 1554: 1549: 1547:Shift_JIS-2004 1538: 1537: 1536: 1531: 1526: 1510: 1505: 1500: 1495: 1490: 1477: 1472: 1471: 1470: 1469:for Vietnamese 1464: 1458: 1452: 1446: 1440: 1434: 1428: 1422: 1410: 1345: 1344: 1343: 1337: 1331: 1325: 1319: 1313: 1307: 1301: 1295: 1289: 1283: 1277: 1271: 1265: 1259: 1258:Western Europe 1247: 1242: 1241: 1240: 1229: 1199: 1198: 1196: 1195: 1179: 1170: 1168: 1154:Main article: 1151: 1148: 1147: 1146: 1143: 1133: 1132: 1126: 1120: 1114: 1100:Cross-platform 1088: 1085: 1080: 1079: 1078: 1077: 1033: 1032: 1031: 1003: 1002: 1001: 977: 976: 975: 947: 946: 945: 917: 916: 915: 856: 853: 850: 849: 846: 843: 839: 838: 835: 832: 828: 827: 824: 821: 817: 816: 813: 810: 806: 805: 802: 799: 798:Latin sharp S 795: 794: 791: 788: 784: 783: 780: 777: 751: 748: 624:ISO/IEC 8859-1 580: 577: 543:Main article: 540: 537: 536: 535: 532: 529: 522: 514: 511: 510: 509: 502: 495: 480: 468: 465: 418:Main article: 415: 412: 411: 410: 390: 383: 376: 364: 353: 350:Greek alphabet 346:Latin alphabet 338: 323: 320: 291:IBM System/360 157:Bacon's cipher 151: 148: 59:human language 26: 24: 14: 13: 10: 9: 6: 4: 3: 2: 4360: 4349: 4346: 4344: 4341: 4340: 4338: 4325: 4315: 4309: 4306: 4304: 4301: 4299: 4296: 4294: 4291: 4289: 4286: 4284: 4281: 4279: 4276: 4274: 4271: 4270: 4268: 4264: 4258: 4255: 4253: 4250: 4246: 4243: 4241: 4238: 4237: 4236: 4233: 4231: 4228: 4227: 4225: 4223: 4219: 4213: 4210: 4208: 4205: 4203: 4200: 4198: 4195: 4193: 4190: 4188: 4187: 4183: 4179: 4176: 4174: 4171: 4169: 4166: 4165: 4164: 4161: 4159: 4156: 4154: 4151: 4147: 4144: 4142: 4139: 4138: 4136: 4134: 4131: 4129: 4126: 4124: 4121: 4119: 4116: 4112: 4109: 4108: 4107: 4104: 4102: 4099: 4097: 4094: 4093: 4091: 4087: 4081: 4078: 4076: 4073: 4071: 4068: 4066: 4063: 4061: 4058: 4057: 4055: 4052: 4048: 4042: 4039: 4037: 4034: 4032: 4029: 4027: 4024: 4022: 4019: 4017: 4014: 4012: 4009: 4007: 4004: 4002: 3999: 3997: 3994: 3992: 3989: 3987: 3984: 3982: 3979: 3978: 3976: 3974: 3973:ISO/IEC 10646 3970: 3966: 3960: 3957: 3955: 3952: 3950: 3947: 3945: 3942: 3940: 3937: 3935: 3932: 3930: 3927: 3925: 3922: 3920: 3917: 3915: 3912: 3910: 3907: 3905: 3902: 3900: 3897: 3895: 3892: 3890: 3887: 3885: 3882: 3880: 3877: 3875: 3872: 3870: 3867: 3865: 3862: 3860: 3857: 3855: 3852: 3850: 3847: 3845: 3842: 3840: 3837: 3835: 3832: 3830: 3827: 3825: 3822: 3820: 3817: 3815: 3812: 3810: 3806: 3803: 3801: 3798: 3796: 3793: 3791: 3790:Compucolor II 3788: 3786: 3783: 3781: 3778: 3776: 3773: 3771: 3768: 3766: 3763: 3761: 3758: 3756: 3753: 3751: 3748: 3746: 3745:Acorn RISC OS 3743: 3741: 3738: 3736: 3733: 3731: 3728: 3726: 3723: 3721: 3718: 3716: 3713: 3711: 3708: 3707: 3705: 3701: 3695: 3692: 3690: 3687: 3685: 3682: 3680: 3677: 3675: 3674:8-bit Turkish 3672: 3670: 3667: 3663: 3660: 3658: 3655: 3653: 3650: 3648: 3645: 3643: 3640: 3638: 3635: 3633: 3630: 3628: 3625: 3623: 3620: 3618: 3615: 3614: 3613: 3610: 3608: 3605: 3604: 3602: 3599: 3595: 3591: 3585: 3582: 3580: 3577: 3576: 3574: 3571: 3567: 3561: 3558: 3556: 3553: 3551: 3548: 3546: 3543: 3541: 3538: 3536: 3533: 3531: 3528: 3526: 3523: 3521: 3518: 3516: 3513: 3511: 3508: 3506: 3503: 3501: 3498: 3496: 3493: 3491: 3488: 3486: 3483: 3481: 3478: 3475: 3471: 3468: 3466: 3463: 3461: 3458: 3457: 3455: 3453: 3449: 3443: 3440: 3438: 3435: 3433: 3430: 3428: 3425: 3423: 3420: 3418: 3415: 3413: 3410: 3408: 3405: 3403: 3400: 3398: 3395: 3393: 3390: 3388: 3385: 3383: 3380: 3378: 3375: 3373: 3370: 3368: 3365: 3363: 3360: 3358: 3355: 3353: 3350: 3348: 3345: 3343: 3340: 3339: 3337: 3335: 3331: 3325: 3322: 3320: 3317: 3315: 3312: 3310: 3307: 3305: 3302: 3300: 3297: 3295: 3292: 3290: 3287: 3285: 3282: 3280: 3277: 3275: 3272: 3270: 3267: 3265: 3262: 3260: 3257: 3255: 3252: 3250: 3247: 3245: 3242: 3240: 3237: 3235: 3232: 3230: 3227: 3225: 3222: 3220: 3217: 3215: 3212: 3210: 3207: 3205: 3202: 3200: 3197: 3195: 3192: 3190: 3187: 3185: 3182: 3180: 3177: 3175: 3172: 3170: 3167: 3165: 3162: 3160: 3157: 3155: 3152: 3150: 3147: 3145: 3142: 3140: 3137: 3135: 3132: 3130: 3127: 3125: 3122: 3120: 3117: 3115: 3112: 3110: 3107: 3105: 3102: 3100: 3097: 3095: 3092: 3090: 3087: 3085: 3082: 3080: 3077: 3075: 3072: 3070: 3067: 3065: 3062: 3060: 3057: 3055: 3052: 3050: 3047: 3045: 3042: 3040: 3037: 3035: 3032: 3030: 3027: 3025: 3022: 3021: 3019: 3015: 3009: 3006: 3004: 3001: 2999: 2996: 2994: 2991: 2989: 2986: 2984: 2981: 2979: 2976: 2974: 2971: 2969: 2966: 2964: 2961: 2959: 2956: 2954: 2951: 2949: 2946: 2944: 2941: 2939: 2936: 2934: 2931: 2929: 2926: 2924: 2921: 2919: 2916: 2914: 2911: 2909: 2906: 2904: 2901: 2899: 2896: 2894: 2891: 2889: 2886: 2884: 2881: 2879: 2876: 2874: 2871: 2869: 2866: 2865: 2863: 2859: 2854: 2848: 2845: 2843: 2842:ISO/IEC 10367 2840: 2838: 2835: 2834: 2832: 2830: 2826: 2820: 2817: 2815: 2812: 2810: 2807: 2805: 2802: 2800: 2797: 2795: 2792: 2790: 2787: 2785: 2782: 2780: 2777: 2775: 2772: 2770: 2767: 2765: 2762: 2760: 2757: 2755: 2752: 2750: 2747: 2745: 2742: 2740: 2737: 2735: 2732: 2730: 2727: 2725: 2722: 2720: 2717: 2715: 2712: 2710: 2707: 2705: 2702: 2700: 2697: 2695: 2692: 2690: 2687: 2685: 2682: 2680: 2677: 2675: 2672: 2670: 2667: 2666: 2664: 2660: 2654: 2651: 2649: 2646: 2644: 2641: 2639: 2636: 2634: 2631: 2629: 2626: 2622: 2619: 2617: 2614: 2613: 2612: 2609: 2608: 2606: 2602: 2594: 2591: 2589: 2586: 2584: 2581: 2579: 2576: 2575: 2573: 2569: 2566: 2564: 2561: 2560: 2558: 2554: 2551: 2550: 2548: 2544: 2541: 2539: 2536: 2534: 2531: 2529: 2526: 2524: 2521: 2519: 2516: 2514: 2511: 2509: 2506: 2504: 2501: 2499: 2496: 2494: 2493:-5 (Cyrillic) 2491: 2489: 2486: 2484: 2481: 2479: 2476: 2474: 2471: 2470: 2468: 2467: 2465: 2463: 2459: 2453: 2450: 2444: 2441: 2439: 2436: 2435: 2433: 2431: 2428: 2426: 2423: 2421: 2418: 2417: 2416: 2412: 2408: 2405: 2403: 2400: 2396: 2393: 2392: 2391: 2388: 2386: 2383: 2379: 2376: 2372: 2369: 2367: 2364: 2362: 2359: 2357: 2354: 2353: 2352: 2349: 2347: 2344: 2343: 2342: 2339: 2338: 2336: 2332: 2328: 2321: 2316: 2314: 2309: 2307: 2302: 2301: 2298: 2291: 2288: 2286: 2283: 2281: 2278: 2275: 2272: 2270: 2267: 2266: 2262: 2245: 2241: 2237: 2233: 2227: 2224: 2217: 2216: 2210: 2209: 2205: 2197: 2193: 2187: 2184: 2179: 2175: 2169: 2166: 2161: 2157: 2151: 2148: 2135: 2129: 2126: 2122: 2120: 2116: 2112: 2108: 2104: 2100: 2096: 2082: 2078: 2074: 2068: 2065: 2052: 2045: 2043: 2041: 2039: 2037: 2033: 2028: 2022: 2015: 2014: 2006: 2004: 2000: 1995: 1989: 1987: 1985: 1983: 1981: 1979: 1977: 1973: 1968: 1964: 1957: 1955: 1953: 1951: 1947: 1934: 1930: 1923: 1920: 1912: 1905: 1899: 1896: 1884: 1877: 1874: 1861: 1854: 1851: 1847: 1836: 1832: 1825: 1823: 1819: 1815: 1804: 1800: 1794: 1792: 1788: 1776: 1772: 1766: 1764: 1760: 1755: 1751: 1745: 1742: 1736: 1731: 1728: 1726: 1723: 1720: 1717: 1715: 1712: 1709: 1706: 1703: 1702: 1698: 1696: 1693: 1690: 1687: 1684: 1681: 1679: 1676: 1674: 1671: 1669: 1666: 1665: 1661: 1656: 1652: 1649: 1645: 1642: 1640: 1637: 1635: 1632: 1631: 1629: 1626: 1622: 1619: 1617: 1614: 1611: 1608: 1607: 1605: 1601: 1597: 1596: 1594: 1593:Code page 950 1590: 1586: 1582: 1579: 1576: 1573: 1571: 1568: 1567: 1566: 1562: 1558: 1555: 1553: 1550: 1548: 1545: 1544: 1542: 1539: 1535: 1532: 1530: 1527: 1524: 1523:Code page 932 1520: 1517: 1516: 1514: 1511: 1509: 1506: 1504: 1501: 1499: 1496: 1494: 1491: 1489: 1485: 1481: 1478: 1476: 1473: 1468: 1465: 1462: 1459: 1456: 1453: 1450: 1447: 1444: 1441: 1438: 1435: 1432: 1429: 1426: 1423: 1420: 1417: 1416: 1414: 1411: 1409: 1405: 1401: 1397: 1393: 1389: 1385: 1381: 1377: 1373: 1369: 1365: 1361: 1357: 1353: 1349: 1346: 1341: 1338: 1335: 1332: 1329: 1326: 1323: 1320: 1317: 1314: 1311: 1308: 1305: 1302: 1299: 1296: 1293: 1290: 1287: 1284: 1281: 1278: 1275: 1272: 1269: 1266: 1263: 1260: 1257: 1254: 1253: 1251: 1248: 1246: 1243: 1239: 1236: 1235: 1234: 1231: 1230: 1228: 1226: 1222: 1218: 1214: 1210: 1206: 1193: 1184: 1180: 1177: 1176: 1174: 1171:This section 1169: 1166: 1162: 1161: 1157: 1149: 1144: 1141: 1140: 1139: 1137: 1130: 1127: 1124: 1121: 1118: 1115: 1112: 1108: 1105: 1104: 1103: 1101: 1097: 1095: 1086: 1084: 1041: 1040: 1038: 1034: 1007: 1006: 1004: 981: 980: 978: 955: 954: 952: 949:Five Unicode 948: 925: 924: 922: 918: 899: 898: 896: 892: 891: 890: 862: 854: 847: 844: 842:Section sign 841: 840: 836: 833: 830: 829: 825: 822: 819: 818: 814: 811: 809:Han for East 808: 807: 803: 800: 797: 796: 792: 789: 786: 785: 781: 778: 775: 774: 771: 768: 766: 762: 758: 749: 747: 744: 742: 738: 734: 729: 727: 723: 719: 715: 711: 707: 703: 698: 696: 692: 688: 684: 680: 676: 672: 668: 664: 660: 656: 652: 648: 643: 640: 636: 631: 629: 625: 621: 620: 615: 611: 606: 604: 599: 597: 593: 589: 585: 578: 576: 574: 570: 565: 563: 559: 555: 550: 546: 538: 533: 530: 528: 523: 520: 519: 518: 512: 507: 503: 500: 496: 493: 489: 485: 481: 478: 474: 473: 472: 466: 464: 462: 456: 452: 450: 449:code page 437 446: 442: 438: 434: 430: 425: 421: 413: 408: 403: 399: 395: 391: 388: 384: 381: 377: 374: 369: 365: 362: 358: 354: 351: 347: 343: 342:character set 339: 336: 335: 330: 329: 328: 321: 319: 317: 313: 303: 300: 294: 292: 288: 284: 280: 276: 272: 268: 264: 259: 257: 253: 249: 244: 236: 232: 230: 225: 220: 216: 212: 207: 205: 201: 196: 194: 190: 186: 185:amateur radio 182: 181:telegraph key 178: 174: 170: 166: 162: 158: 149: 147: 145: 141: 137: 133: 129: 125: 120: 118: 114: 110: 106: 102: 98: 93: 91: 90:character map 87: 83: 79: 76: 72: 68: 64: 60: 56: 53: 49: 42: 38: 34: 30: 19: 4240:ISO/IEC 6429 4197:Stanford/ITS 4184: 4118:ARIB STD-B24 3899:Sega SC-3000 3800:DEC RADIX 50 2837:ISO/IEC 8859 2829:ISO/IEC 2022 2574:Adaptations 2533:-14 (Celtic) 2528:-13 (Baltic) 2518:-10 (Nordic) 2513:-9 (Turkish) 2462:ISO/IEC 8859 2326: 2251:. Retrieved 2214: 2195: 2186: 2177: 2168: 2159: 2150: 2138:. Retrieved 2128: 2111:page numbers 2092: 2085:. Retrieved 2067: 2055:. Retrieved 2012: 1966: 1937:. Retrieved 1932: 1922: 1898: 1886:. Retrieved 1876: 1864:. Retrieved 1853: 1845: 1838:. Retrieved 1834: 1813: 1806:. Retrieved 1802: 1778:. Retrieved 1774: 1753: 1744: 1699: 1655:ISO/IEC 6937 1552:EUC-JIS-2004 1475:Mac OS Roman 1467:Windows-1258 1461:Windows-1257 1455:Windows-1256 1449:Windows-1255 1443:Windows-1254 1437:Windows-1253 1431:Windows-1252 1425:Windows-1251 1419:Windows-1250 1202: 1187: 1183:adding to it 1172: 1134: 1107:Web browsers 1098: 1090: 1081: 858: 769: 753: 745: 732: 730: 699: 679:ISO/IEC 2022 646: 644: 638: 634: 632: 617: 609: 607: 602: 600: 582: 566: 551: 548: 516: 470: 457: 453: 426: 423: 393: 386: 379: 367: 356: 341: 332: 325: 304: 295: 260: 241: 218: 215:Émile Baudot 208: 197: 189:aeronautical 153: 121: 94: 47: 46: 37:Punched tape 29: 3959:ZX Spectrum 3914:Sinclair QL 3750:Amstrad CPC 3669:8-bit Greek 3596:terminals ( 3309:Iran System 2861:("scripts") 2508:-8 (Hebrew) 2498:-6 (Arabic) 2395:ISO/IEC 646 2087:15 February 1939:10 November 1862:. Smartbear 1621:ISO-2022-KR 1534:ISO-2022-JP 1521:(Microsoft 1445:for Turkish 1340:ISO 8859-16 1334:ISO 8859-15 1328:ISO 8859-14 1322:ISO 8859-13 1316:ISO 8859-11 1310:ISO 8859-10 1094:transcoding 1087:Transcoding 951:code points 859:Consider a 720:, which is 712:, which is 619:code points 612:(CCS) is a 596:code points 513:Code points 429:page number 322:Terminology 312:code points 283:1400 series 279:7000 Series 200:Baudot code 113:punctuation 82:code points 71:transformed 67:transmitted 4337:Categories 4245:JIS X 0211 4153:ISO-IR-169 4006:UTF-EBCDIC 3572:code pages 3299:CSX+ Indic 2903:Devanagari 2858:Code pages 2779:LST 1590-4 2749:JIS X 0213 2744:JIS X 0212 2739:JIS X 0208 2734:JIS X 0201 2699:GOST 10859 2621:CCCII/EACC 2523:-11 (Thai) 2503:-7 (Greek) 2438:background 2361:Wabun/Kana 2253:August 25, 1888:1 November 1737:References 1598:Hong Kong 1541:JIS X 0213 1513:JIS X 0208 1457:for Arabic 1451:for Hebrew 1304:ISO 8859-9 1298:ISO 8859-8 1292:ISO 8859-7 1286:ISO 8859-6 1280:ISO 8859-5 1274:ISO 8859-4 1268:ISO 8859-3 1262:ISO 8859-2 1256:ISO 8859-1 999:0x00010400 995:0x00000063 991:0x00000332 987:0x00000062 983:0x00000061 820:Ampersand 776:Character 639:code units 554:diacritics 539:Characters 467:Code units 414:Code pages 387:code space 380:code point 231:standard. 177:Morse code 55:characters 4298:MICR code 4133:IEC-P27-1 4111:ISO-IR-68 4016:DIN 91379 3894:SAM Coupé 3829:GSM 03.38 3819:Galaksija 3314:Kamenický 3294:CSX Indic 3003:Ukrainian 2789:Shift JIS 2769:KS X 1002 2764:KS X 1001 2689:DIN 66003 2684:CNS 11643 2452:Transcode 2430:ITU T.101 2356:Non-Latin 2057:12 August 1933:InfoWorld 1840:2 January 1808:2 January 1799:"Charset" 1610:KS X 1001 1519:Shift JIS 1439:for Greek 1190:June 2024 921:graphemes 884:𐐀 700:Although 573:graphemes 562:Ligatures 433:Microsoft 420:Code page 394:code unit 361:code page 334:character 111:and some 97:telegraph 88:", or a " 86:code page 78:computers 52:graphical 4303:Mojibake 4158:ISO 2033 4123:Fieldata 4101:ASMO 449 4011:GB 18030 3971: / 3919:Teletext 3909:Sharp MZ 3839:HP FOCAL 3834:HP Roman 3765:Atari ST 3755:Apple II 3289:CS Indic 2983:Romanian 2958:Keyboard 2938:Gurmukhi 2933:Gujarati 2923:Georgian 2898:Cyrillic 2893:Croatian 2868:Armenian 2774:LST 1564 2759:KPS 9566 2719:GB 18030 2714:GB 12052 2709:GB 12345 2694:ELOT 927 2628:ISO 5426 2588:Estonian 2425:ITU T.61 2415:Teletext 2411:Videotex 2385:Fieldata 2371:Cyrillic 2244:Archived 2240:77-90165 2140:25 March 2136:. Oracle 2103:ISO 2022 2081:Archived 1911:Archived 1866:29 April 1780:29 April 1701:Mojibake 1673:Alt code 1662:See also 1581:GB 18030 1563:Chinese 1250:ISO 8859 787:Latin A 718:UTF-16BE 706:UTF-32LE 702:UTF-32BE 667:UTF-32LE 663:UTF-16LE 659:UTF-32BE 655:UTF-16BE 614:function 492:GB 18030 224:Fieldata 109:numerals 4192:SEASCII 4186:Mojikyō 4173:KOI8-RU 4096:ABICOMP 3969:Unicode 3879:PETSCII 3869:NEC APC 3805:DEC MCS 3760:ATASCII 3657:Swedish 3642:Finnish 3627:Spanish 3319:Mazovia 3284:ABICOMP 2993:Turkish 2948:Iceland 2856:Mac OS 2799:TIS-620 2704:GB 2312 2679:BraSCII 2669:ArmSCII 2407:Teletex 2366:Chinese 1775:W3Techs 1708:Mojikyō 1628:Unicode 1606:Korean 1587:Taiwan 1570:GB 2312 1565:Guobiao 1233:ISO 646 1207:on the 1136:Windows 973:U+10400 879:U+10400 871:̲ 855:Example 845:U+00A7 834:U+00A1 823:U+0026 812:U+6771 801:U+00DF 790:U+0041 737:Unicode 584:Unicode 445:Windows 256:IBM 603 193:Unicode 161:Braille 150:History 126:on the 117:Unicode 75:digital 4202:Symbol 4178:KOI8-U 4168:KOI8-R 4036:TACE16 4026:CESU-8 4021:BOCU-1 4001:UTF-32 3996:UTF-16 3939:WISCII 3929:TRS-80 3849:SQUOZE 3844:HP RPL 3684:Hebrew 3679:SI 960 3647:French 3570:EBCDIC 3460:CER-GS 2943:Hebrew 2918:Gaelic 2883:Celtic 2873:Arabic 2819:YUSCII 2809:VISCII 2794:SI 960 2784:PASCII 2633:5426-2 2611:MARC-8 2346:Needle 2238:  2228:  2115:PCTerm 2105:, the 2023:  1644:UTF-32 1639:UTF-16 1616:EUC-KR 1529:EUC-JP 1508:VISCII 1484:KOI8-U 1480:KOI8-R 1300:Hebrew 1288:Arabic 1245:EBCDIC 1225:UTF-16 1029:0xDC00 1025:0xD801 1021:0x0063 1017:0x0332 1013:0x0062 1009:0x0061 969:U+0063 965:U+0332 961:U+0062 957:U+0061 881: 868: 866:U+0332 861:string 826:& 782:Glyph 757:planes 675:UTF-32 671:UTF-16 665:, and 506:UTF-32 499:UTF-16 488:EBCDIC 461:CCSIDs 439:, and 229:ECMA-6 219:baudot 202:, the 144:UTF-16 73:using 69:, and 63:stored 4273:CCSID 4146:8-bit 4141:7-bit 4137:INIS 3991:UTF-8 3986:UTF-7 3981:UTF-1 3859:LMBCS 3795:CP/M+ 3637:Dutch 3622:Swiss 3304:CWI-2 3008:VT100 2978:Roman 2973:Ogham 2953:Inuit 2928:Greek 2814:VSCII 2804:TSCII 2754:KOI-7 2729:ISCII 2724:HKSCS 2616:ANSEL 2578:Welsh 2402:BCDIC 2390:ASCII 2351:Morse 2247:(PDF) 2219:(PDF) 2107:VT510 2017:(PDF) 1914:(PDF) 1907:(PDF) 1651:ANSEL 1634:UTF-8 1600:HKSCS 1503:TSCII 1498:ISCII 1408:CP872 1404:CP869 1400:CP866 1396:CP865 1392:CP863 1388:CP862 1384:CP861 1380:CP860 1376:CP858 1372:CP857 1368:CP855 1364:CP852 1360:CP850 1356:CP737 1352:CP720 1348:CP437 1294:Greek 1238:ASCII 1213:UTF-8 1117:iconv 1037:bytes 919:Five 893:Four 710:UTF-8 651:UTF-8 592:bytes 569:glyph 558:glyph 484:UTF-8 477:ASCII 132:UTF-8 41:ASCII 4207:TRON 4060:Cork 4031:SCSU 3954:ZX81 3949:ZX80 3944:XCCS 3874:NeXT 3854:LICS 3809:NRCS 3770:BICS 3740:1058 3735:1057 3730:1056 3725:1055 3720:1054 3715:1053 3710:1052 3584:DKOI 3540:1270 3535:1258 3530:1257 3525:1256 3520:1255 3515:1254 3510:1253 3505:1252 3500:1251 3495:1250 3485:1169 3442:1133 3437:1124 3432:1046 3427:1019 3422:1018 3417:1017 3412:1016 3407:1015 3402:1014 3397:1013 3392:1012 3387:1010 3382:1009 3377:1008 3372:1006 3279:3846 3274:1127 3269:1118 3264:1117 3259:1116 3254:1115 3249:1098 3244:1044 3239:1043 3234:1042 3229:1040 3224:1034 2988:Sámi 2674:Big5 2653:6862 2648:6438 2643:5428 2638:5427 2568:Sámi 2443:sets 2409:and 2255:2019 2236:LCCN 2226:ISBN 2142:2018 2097:and 2089:2017 2059:2023 2021:ISBN 1941:2020 1890:2018 1868:2014 1842:2021 1810:2021 1782:2024 1719:TRON 1589:Big5 1488:KOI7 1318:Thai 1219:and 1203:The 1123:luit 1075:0x80 1071:0x90 1067:0x90 1063:0xF0 1059:0x63 1055:0xB2 1051:0xCC 1047:0x62 1043:0x61 704:and 695:BOCU 693:and 691:SCSU 677:and 490:and 402:word 348:and 281:and 273:and 187:and 138:and 122:The 4163:KOI 4080:OT1 4075:OMS 4070:OML 4065:LY1 4051:TeX 3864:MSX 3824:GEM 3780:CDC 3598:VTx 3594:DEC 3480:950 3474:GBK 3470:936 3465:932 3367:922 3362:921 3357:915 3352:912 3347:896 3342:895 3324:MIK 3219:951 3214:950 3209:949 3204:942 3199:936 3194:932 3189:904 3184:903 3179:899 3174:897 3169:869 3164:868 3159:867 3154:866 3149:865 3144:864 3139:863 3134:862 3129:861 3124:860 3119:859 3114:858 3109:857 3104:856 3099:855 3094:853 3089:852 3084:851 3079:850 3074:778 3069:777 3064:776 3059:775 3054:773 3049:770 3044:737 3039:720 3034:708 3029:668 3024:437 2099:ISO 2095:DEC 1653:or 1575:GBK 1493:MIK 1211:is 1209:web 1185:. 741:XML 697:). 685:or 628:500 601:An 451:). 437:SAP 275:704 271:702 267:BCD 252:IBM 211:bit 195:). 130:is 128:web 92:". 4339:: 4128:HZ 2242:. 2234:. 2194:. 2176:. 2158:. 2091:. 2075:. 2035:^ 2002:^ 1975:^ 1965:. 1949:^ 1931:. 1844:. 1833:. 1821:^ 1812:. 1801:. 1790:^ 1773:. 1762:^ 1752:. 1595:) 1486:, 1482:, 1415:: 1406:, 1402:, 1398:, 1394:, 1390:, 1386:, 1382:, 1378:, 1374:, 1370:, 1366:, 1362:, 1358:, 1354:, 1350:, 1252:: 1138:: 1102:: 1073:, 1069:, 1065:, 1061:, 1057:, 1053:, 1049:, 1045:, 1039:) 1027:, 1023:, 1019:, 1015:, 1011:, 997:, 993:, 989:, 985:, 971:, 967:, 963:, 959:, 953:: 943:𐐀 941:, 937:, 933:, 929:, 923:: 913:𐐀 911:, 907:, 905:b̲ 903:, 897:: 848:§ 837:¡ 815:東 804:ß 793:Α 767:. 673:, 661:, 657:, 653:, 645:A 633:A 608:A 486:, 435:, 392:A 385:A 378:A 375:). 366:A 355:A 340:A 331:A 163:, 159:, 107:, 65:, 3807:/ 3600:) 3476:) 3472:( 2413:/ 2319:e 2312:t 2305:v 2257:. 2162:. 2144:. 2061:. 2029:. 1969:. 1943:. 1892:. 1870:. 1784:. 1192:) 1188:( 939:c 935:_ 931:b 927:a 909:c 901:a 459:( 409:. 363:. 265:( 171:( 20:)

Index

Coded character set

Punched tape
ASCII
graphical
characters
human language
stored
transmitted
transformed
digital
computers
code points
code page
character map
telegraph
written languages
upper case letters
numerals
punctuation
Unicode
most used character encoding
web
UTF-8
application programs
operating system
UTF-16
Bacon's cipher
Braille
international maritime signal flags

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.