Knowledge (XXG)

Character encodings in HTML

Source 📝

561:
Successful viewing of a page is not necessarily an indication that its encoding is specified correctly. If the page's creator and reader are both assuming some platform-specific character encoding, and the server does not send any identifying information, then the reader will nonetheless see the page
1378:
Unnecessary use of HTML character references may significantly reduce HTML readability. If the character encoding for a web page is chosen appropriately, then HTML character references are usually only required for markup delimiting characters as mentioned above, and for a few special characters (or
578:
HTML 5.0 and 5.1) specifies a list of encodings which browsers must support. The HTML standards forbid support of other encodings. The Encoding Standard further stipulates that new formats, new protocols (even when existing formats are used) and authors of new documents are required to use
477:
With this second approach, because the character encoding cannot be known until the declaration is parsed, there is a problem knowing which character encoding is used in the document up to and including the declaration itself. If the character encoding is an
1300:
used by authors of HTML documents, will be able to render all HTML characters. Most modern software is able to display most or all of the characters for the user's language, and will draw a box or other clear indicator for characters they cannot render.
1128:, which also allow sequences of ASCII bytes to be interpreted differently, this approach was not seen as feasible for them since they are comparatively more frequently used in deployed content. The following encodings receive this treatment: 482:
then the content up to and including the declaration itself should be pure ASCII and this will work correctly. For character encodings that are not ASCII extensions (i.e. not a superset of ASCII), such as
1500:(which gives é, Latin lower-case E with acute accent, U+00E9 in Unicode) in an XML document will generate an error unless the entity has already been defined. XML also requires that the 535:) language environments where there are several different multi-byte encodings in use, auto-detection is also often employed. Finally, browsers usually permit the user to override 1120:) which may exploit a difference between the client and server in what encodings are supported in order to mask malicious content. Although the same security concern applies to 1541: 1372: 1309: 150: 20: 503:. An "encoding sniffing algorithm" is defined in the specification to determine the character encoding of the document based on multiple sources of input, including: 760:(U+3000) for compatibility reasons, and as such excluding U+E5E5 (a private use character). Also, specified with 0x80 accepted as an alternative encoding of the 2462: 558:
ASCII superset encoding, and they are less efficient for text with a high frequency of ASCII characters, which is usually the case for HTML documents.
1007:
Uses the same encoder and decoder as ISO-8859-8, but is not subject to the visual-order behaviour which is used for documents labelled as ISO-8859-8.
1095: 276: 871:(BOM), if present, takes priority over any label. Specified for decoding only; form submissions from UTF-16-coded documents are to be encoded in 554:, which can be used for all languages as well, are less widely used because they can be harder to handle in programming languages that assume a 1407:
there are only five predefined character entity references. These are used to escape characters that are markup sensitive in certain contexts:
1090: 1064:
The specification uses the same index as used for Shift JIS (insofar as is within reach of the EUC code set 1), i.e. includes NEC extensions.
531:-speaking users, but other languages regularly—in some cases, always—require characters outside that range. In Chinese, Japanese, and Korean ( 777: 780:
variant, although most of the HKSCS extensions (those with lead bytes less than 0xA1) are not included by the encoder, only by the decoder.
1395:, such as space and tab, must be escaped using entities. Other languages related to HTML have their own methods of escaping characters. 562:
as the creator intended, but other readers on different platforms or with different native languages will not see the page as intended.
1312:. Only a few higher-numbered codes can be created using entity names, but all can be created by decimal number character reference. 586:
Besides UTF-8, the following encodings are explicitly listed in the HTML standard itself, with reference to the Encoding Standard:
574:
Encoding Standard, referenced by recent HTML standards (the current WHATWG HTML Living Standard, as well as the formerly competing
491:, a processor of HTML, such as a web browser, should be able to parse the declaration in some cases through the use of heuristics. 1308:
standard set, most of these characters can be used without a character reference. Codes from 160 to 255 can all be created using
836: 523:
Analysis of the document bytes looking for specific sequences or ranges of byte values, and other tentative detection mechanisms.
1625: 2438:
HTML Entity Encoding chapter of Browser Security Handbook – more information about current browsers and their entity handling
307: 328:
were given reasonably complete treatment. When an HTML document includes special characters outside the range of seven-bit
281: 238: 177: 1690: 895:
The following additional encodings are listed in the Encoding Standard, and support for them is therefore also required:
810:
The specification uses the same index as used for Shift JIS (insofar as is within reach), i.e. includes NEC extensions.
1893: 527:
Characters outside of the printable ASCII range (32 to 126) usually appear incorrectly. This presents few problems for
1371:. For a list of all named HTML character entity references along with the versions in which they were introduced, see 1165: 360: 2442: 1161: 1633: 1387:
is used). Incorrect HTML entity escaping may also open up security vulnerabilities for injection attacks such as
324:) has been in use since 1991, HTML 4.0 from December 1997 was the first standardized version where international 233: 814:
is converted to fullwidth by the encoder, but accepted using an escape sequence (ESC 0x28 0x49) by the decoder.
2387: 1658: 1112:
The standard also defines a "replacement" decoder, which maps all content labelled as certain encodings to the
987: 982: 1215: 1745: 382: 135: 1113: 325: 167: 27: 1496:
All other character entity references have to be defined before they can be used. For example, use of
1392: 1388: 1117: 172: 89: 1516:, which is an XML application, supports the HTML entity set, along with XML's predefined entities. 884:
Maps 0x00 through 0x7F to U+0000 through U+007F, and 0x80 through 0xFF to U+F780 through U+F7FF (a
832: 378: 2437: 2432: 2361: 1889: 1606: 1582: 542:
It is increasingly common for multilingual websites and websites in non-Western languages to use
300: 189: 2181: 1967: 1704: 848:
Specified for decoding only; form submissions from UTF-16-coded documents are to be encoded in
2357: 2331: 2305: 2279: 2253: 2206: 2177: 2151: 2125: 2099: 2073: 2044: 2018: 1992: 1963: 1937: 1911: 1858: 1832: 1803: 1774: 1722: 757: 377:
This method gives the HTTP server a convenient way to alter document's encoding according to
1574: 1531: 1525: 885: 528: 155: 868: 811: 514: 479: 116: 57: 2335: 2022: 1996: 348:
There are two general ways to specify which character encoding is used in the document.
2232: 1915: 532: 260: 123: 67: 2443:
The Open Web Application Security Project's wiki article on cross-site scripting (XSS)
2457: 2451: 2427: 2077: 1778: 1562: 1536: 900: 555: 293: 162: 128: 111: 1862: 1807: 1586: 1055:) excludes four-byte codes, and favours the one-byte 0x80 representation for U+20AC. 2155: 1293: 977: 972: 925: 646: 641: 636: 631: 626: 621: 616: 611: 106: 101: 96: 47: 2283: 1941: 1836: 888:
range), such that the low 8 bits of the code point always match the original byte.
2048: 1327:
is a case-sensitive alphanumeric string. For example, "λ" can also be encoded as
2309: 1403:
Unlike traditional HTML with its large range of character entity references, in
1297: 1289: 1266: 1183: 1143: 1138: 1133: 1121: 1075:
The following encodings are listed as explicit examples of forbidden encodings:
953: 948: 943: 938: 933: 798: 765: 707: 666: 606: 337: 255: 250: 140: 84: 2391: 2210: 1662: 1148: 1116:(�), refusing to process it at all. This is intended to prevent attacks (e.g. 1065: 920: 915: 910: 905: 743: 728: 601: 596: 591: 352: 199: 194: 72: 62: 2103: 1391:. If HTML attributes are left unquoted, certain characters, most importantly 1170:
In addition to native character encodings, characters can also be encoded as
1421: 815: 761: 661: 333: 2129: 2383: 1654: 1528:– used by many browsers when character encoding metadata is not available 819: 681: 676: 651: 488: 484: 2257: 1380: 1363:
are already used to delimit markup. This notably did not include XML's
1258: 1219: 1179: 1043: 1025: 967: 701: 1285:
may mix uppercase and lowercase, though uppercase is the usual style.
861:
For compatibility with deployed content, also specified for the plain
381:; certain HTTP server software can do it, for example Apache with the 2369: 2343: 2317: 2291: 2265: 2218: 2189: 2163: 2137: 2111: 2085: 2056: 2030: 2004: 1975: 1949: 1923: 1870: 1844: 1815: 1786: 1578: 1125: 1105: 1100: 1080: 992: 963: 958: 863: 671: 571: 551: 547: 463:
documents have a third option: to express the character encoding via
245: 221: 2422: 1051:
for decoding purposes. For encoding purposes, labelling as GBK (or
1513: 1384: 1368: 1305: 1085: 872: 849: 722: 580: 543: 500: 460: 432: 391:
Second, a declaration can be included within the document itself.
329: 226: 216: 211: 204: 79: 52: 1570: 1199: 1190:. Character entity references are also sometimes referred to as 1029: 656: 510:
An explicit meta tag within the first 1024 bytes of the document
472:<?xml version="1.0" encoding="utf-8"?> 321: 38: 28:
Help:Percent-encoding § Fixing Links with Unsupported Characters
1504:
in hexadecimal numeric references be in lowercase: for example
1281:
may be any number of digits and may include leading zeros. The
394:
For HTML it is possible to include this information inside the
2399: 2237: 1670: 1404: 794: 790: 575: 464: 182: 1567:
Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content
1198:
for HTML. HTML's usage of character references derives from
16:
Use of encoding systems for international characters in HTML
835:(Windows-949), which is a superset which covers the entire 546:, which allows use of the same encoding for all languages. 435:
also allows the following syntax to mean exactly the same:
768:). Otherwise, follows the mappings from the 2005 standard. 520:
The HTTP Content-Type or other transport layer information
822:(0x0E and 0x0F) are excluded entirely to prevent attacks. 1691:"HTML5 prescan a byte stream to determine its encoding" 1331:
in an HTML document. The character entity references
1315:
Character entity references can also have the format
756:
Specified with 0xA3A0 as a duplicate encoding of the
332:, two goals are worth considering: the information's 1987: 1985: 517:(BOM) within the first three bytes of the document 1769: 1767: 1765: 1763: 1761: 1759: 1542:List of XML and HTML character entity references 1373:List of XML and HTML character entity references 21:List of XML and HTML character entity references 2233:"Bug 17053: Support KOI8-RU mapping for KOI8-U" 2201: 2199: 19:For a list of character entity references, see 2433:The Definitive Guide to Web Character Encoding 1884: 1882: 1880: 1626:"Specifying the document's character encoding" 1620: 1618: 1616: 367:header, which would typically look like this: 2423:Online HTML entity encoder & decoder tool 2390:; Maler, E.; Yergeau, F. (26 November 2008), 1740: 1738: 1736: 1661:; Maler, E.; Yergeau, F. (26 November 2008), 301: 26:For fixing links within Knowledge (XXG), see 8: 2068: 2066: 1304:For codes from 0 to 127, the original 7-bit 344:Specifying the document's character encoding 1968:"5. Indexes (§ Index ISO-2022-JP katakana)" 2182:"9. Legacy single-byte encodings (§ Note)" 1827: 1825: 1028:in positions 0xAE and 0xBE (i.e. includes 308: 294: 34: 1347:are predefined in HTML and SGML, because 1798: 1796: 1409: 1296:used by receivers of HTML documents, or 1273:must be lowercase in XML documents. The 1047:and related labels. Handled the same as 2362:"4.2: Names and labels (§ replacement)" 1561:Fielding, R.; Reschke, J. (June 2014), 1553: 1000: 692: 499:As of HTML5 the recommended charset is 355:can include the character encoding or " 268: 37: 1894:"Notable Differences from IANA Naming" 1663:"Prolog and Document Type Declaration" 398:element near the top of the document: 372:Content-Type: text/html; charset=utf-8 1565:, in Fielding, R; Reschke, J (eds.), 1214:in HTML refers to a character by its 1016:Titled KOI8-U and specified for both 778:Hong Kong Supplementary Character Set 7: 2428:Character entity references in HTML4 424:"text/html; charset=utf-8" 2463:World Wide Web Consortium standards 2130:"6. Hooks for standards (§ decode)" 1916:"5. Indexes (§ index Big5 pointer)" 1032:) but KOI8-U in positions 0x93–9F. 797:extensions, and is more precisely 343: 14: 2392:"Character and Entity References" 320:While Hypertext Markup Language ( 1607:"Apache Module mod_charset_lite" 539:charset label manually as well. 2284:"5. Indexes (§ Index jis0212)" 1942:"5. Indexes (§ Index jis0208)" 1837:"5. Indexes (§ index gb18030)" 1746:"12.2.3.3 Character encodings" 1723:"8.2.2.3. Character encodings" 1705:"8.2.2.3. Character encodings" 1068:is included for decoding only. 1: 2049:"5. Indexes (§ index EUC-KR)" 2023:"12.2.2. ISO-2022-JP encoder" 1997:"12.2.1. ISO-2022-JP decoder" 282:Comparison of browser engines 2211:"index KOI8-U visualization" 1483: 1467: 1451: 1435: 1417: 1176:numeric character references 495:Encoding detection algorithm 1212:numeric character reference 1188:character entity references 1166:Numeric character reference 789:The specification includes 361:Hypertext Transfer Protocol 2479: 1162:Character entity reference 1159: 25: 18: 1863:"10.2.1. gb18030 decoder" 1808:"10.2.2. gb18030 encoder" 1634:World Wide Web Consortium 1206:HTML character references 507:Explicit user instruction 467:declaration, as follows: 277:Document markup languages 2336:"2: Security background" 1399:XML character references 1379:none at all if a native 469: 437: 415:"Content-Type" 400: 369: 2078:"4.3. Output encodings" 1779:"4.2: Names and labels" 1216:Universal Character Set 2156:"14.5. x-user-defined" 1310:character entity names 1225:, and uses the format 1265:is the code point in 1257:is the code point in 1114:replacement character 168:Document Object Model 2388:Sperberg-McQueen, C. 1750:HTML Living Standard 1659:Sperberg-McQueen, C. 1389:cross-site scripting 1367:(') entity prior to 1172:character references 1156:Character references 1118:cross site scripting 173:Browser Object Model 2310:"14.1: replacement" 1041:Also specified for 833:Unified Hangul Code 747:and related labels. 741:Also specified for 732:and related labels. 720:Also specified for 711:and related labels. 699:Also specified for 566:Permitted encodings 379:content negotiation 146:Character encodings 2358:van Kesteren, Anne 2332:van Kesteren, Anne 2306:van Kesteren, Anne 2280:van Kesteren, Anne 2254:van Kesteren, Anne 2207:van Kesteren, Anne 2178:van Kesteren, Anne 2152:van Kesteren, Anne 2126:van Kesteren, Anne 2100:van Kesteren, Anne 2074:van Kesteren, Anne 2045:van Kesteren, Anne 2019:van Kesteren, Anne 1993:van Kesteren, Anne 1964:van Kesteren, Anne 1938:van Kesteren, Anne 1912:van Kesteren, Anne 1890:Mozilla Foundation 1859:van Kesteren, Anne 1833:van Kesteren, Anne 1804:van Kesteren, Anne 1775:van Kesteren, Anne 1636:, 14 December 2017 867:label, although a 2366:Encoding Standard 2340:Encoding Standard 2314:Encoding Standard 2288:Encoding Standard 2262:Encoding Standard 2242:. 19 August 2015. 2215:Encoding Standard 2186:Encoding Standard 2160:Encoding Standard 2134:Encoding Standard 2108:Encoding Standard 2082:Encoding Standard 2053:Encoding Standard 2027:Encoding Standard 2001:Encoding Standard 1972:Encoding Standard 1946:Encoding Standard 1920:Encoding Standard 1898:Crate encoding_rs 1867:Encoding Standard 1841:Encoding Standard 1812:Encoding Standard 1783:Encoding Standard 1709:HTML 5.1 Standard 1494: 1493: 1455:greater-than sign 758:ideographic space 452:"utf-8" 318: 317: 2470: 2410: 2409: 2408: 2406: 2380: 2374: 2373: 2354: 2348: 2347: 2328: 2322: 2321: 2302: 2296: 2295: 2276: 2270: 2269: 2250: 2244: 2243: 2229: 2223: 2222: 2203: 2194: 2193: 2174: 2168: 2167: 2148: 2142: 2141: 2122: 2116: 2115: 2104:"14.4. UTF-16LE" 2096: 2090: 2089: 2070: 2061: 2060: 2041: 2035: 2034: 2015: 2009: 2008: 1989: 1980: 1979: 1960: 1954: 1953: 1934: 1928: 1927: 1908: 1902: 1901: 1886: 1875: 1874: 1855: 1849: 1848: 1829: 1820: 1819: 1800: 1791: 1790: 1771: 1754: 1753: 1742: 1731: 1730: 1719: 1713: 1712: 1701: 1695: 1694: 1687: 1681: 1680: 1679: 1677: 1651: 1645: 1644: 1643: 1641: 1622: 1611: 1610: 1603: 1597: 1596: 1595: 1593: 1579:10.17487/RFC7231 1558: 1532:Unicode and HTML 1526:Charset sniffing 1511: 1507: 1503: 1499: 1481: 1465: 1449: 1433: 1415: 1410: 1366: 1362: 1358: 1354: 1350: 1346: 1342: 1338: 1334: 1330: 1322: 1249: 1235: 1069: 1062: 1056: 1054: 1050: 1046: 1039: 1033: 1024:labels; follows 1023: 1019: 1014: 1008: 1005: 929: 889: 886:Private Use Area 882: 876: 866: 859: 853: 846: 840: 837:Hangul Syllables 829: 823: 808: 802: 787: 781: 775: 769: 754: 748: 746: 739: 733: 731: 725: 718: 712: 710: 704: 697: 473: 456: 453: 450: 447: 444: 441: 428: 425: 422: 419: 416: 413: 410: 407: 404: 397: 387: 386:mod_charset_lite 373: 366: 358: 336:, and universal 310: 303: 296: 261:Rendering engine 151:named characters 35: 2478: 2477: 2473: 2472: 2471: 2469: 2468: 2467: 2448: 2447: 2419: 2414: 2413: 2404: 2402: 2382: 2381: 2377: 2356: 2355: 2351: 2330: 2329: 2325: 2304: 2303: 2299: 2278: 2277: 2273: 2252: 2251: 2247: 2231: 2230: 2226: 2205: 2204: 2197: 2176: 2175: 2171: 2150: 2149: 2145: 2124: 2123: 2119: 2098: 2097: 2093: 2072: 2071: 2064: 2043: 2042: 2038: 2017: 2016: 2012: 1991: 1990: 1983: 1962: 1961: 1957: 1936: 1935: 1931: 1910: 1909: 1905: 1888: 1887: 1878: 1857: 1856: 1852: 1831: 1830: 1823: 1802: 1801: 1794: 1773: 1772: 1757: 1744: 1743: 1734: 1727:HTML 5 Standard 1721: 1720: 1716: 1703: 1702: 1698: 1689: 1688: 1684: 1675: 1673: 1653: 1652: 1648: 1639: 1637: 1624: 1623: 1614: 1605: 1604: 1600: 1591: 1589: 1560: 1559: 1555: 1550: 1522: 1509: 1505: 1501: 1497: 1479: 1463: 1447: 1431: 1413: 1401: 1364: 1360: 1356: 1352: 1348: 1344: 1340: 1336: 1332: 1328: 1316: 1243: 1229: 1208: 1174:, which can be 1168: 1160:Main articles: 1158: 1153: 1144:ISO-2022-CN-EXT 1110: 1073: 1072: 1063: 1059: 1052: 1048: 1042: 1040: 1036: 1021: 1017: 1015: 1011: 1006: 1002: 997: 983:Mac OS Cyrillic 927: 893: 892: 883: 879: 869:byte order mark 862: 860: 856: 847: 843: 830: 826: 812:Half-width kana 809: 805: 788: 784: 776: 772: 755: 751: 742: 740: 736: 727: 721: 719: 715: 706: 700: 698: 694: 689: 568: 515:byte order mark 497: 480:ASCII extension 475: 474: 471: 458: 457: 454: 451: 448: 445: 442: 439: 430: 429: 426: 423: 420: 417: 414: 411: 408: 405: 402: 395: 385: 375: 374: 371: 364: 356: 346: 314: 31: 24: 17: 12: 11: 5: 2476: 2474: 2466: 2465: 2460: 2450: 2449: 2446: 2445: 2440: 2435: 2430: 2425: 2418: 2417:External links 2415: 2412: 2411: 2375: 2349: 2323: 2297: 2271: 2245: 2224: 2195: 2169: 2143: 2117: 2091: 2062: 2036: 2010: 1981: 1955: 1929: 1903: 1876: 1850: 1821: 1792: 1755: 1732: 1714: 1696: 1682: 1646: 1612: 1598: 1563:"Content-Type" 1552: 1551: 1549: 1546: 1545: 1544: 1539: 1534: 1529: 1521: 1518: 1492: 1491: 1488: 1485: 1482: 1476: 1475: 1472: 1471:quotation mark 1469: 1466: 1460: 1459: 1456: 1453: 1450: 1444: 1443: 1440: 1439:less-than sign 1437: 1434: 1428: 1427: 1424: 1419: 1416: 1400: 1397: 1383:encoding like 1251: 1250: 1237: 1236: 1207: 1204: 1192:named entities 1157: 1154: 1152: 1151: 1146: 1141: 1136: 1130: 1109: 1108: 1103: 1098: 1093: 1088: 1083: 1077: 1071: 1070: 1057: 1034: 1009: 999: 998: 996: 995: 990: 985: 980: 975: 970: 961: 956: 951: 946: 941: 936: 931: 923: 918: 913: 908: 903: 897: 891: 890: 877: 854: 841: 824: 803: 782: 770: 749: 734: 713: 691: 690: 688: 687: 686:x-user-defined 684: 679: 674: 669: 664: 659: 654: 649: 644: 639: 634: 629: 624: 619: 614: 609: 604: 599: 594: 588: 567: 564: 525: 524: 521: 518: 511: 508: 496: 493: 470: 438: 401: 370: 345: 342: 316: 315: 313: 312: 305: 298: 290: 287: 286: 285: 284: 279: 271: 270: 266: 265: 264: 263: 258: 253: 248: 243: 242: 241: 231: 230: 229: 224: 219: 209: 208: 207: 197: 192: 187: 186: 185: 175: 170: 165: 160: 159: 158: 153: 143: 138: 133: 132: 131: 124:HTML attribute 121: 120: 119: 114: 109: 104: 94: 93: 92: 90:Mobile Profile 87: 77: 76: 75: 70: 65: 60: 50: 42: 41: 15: 13: 10: 9: 6: 4: 3: 2: 2475: 2464: 2461: 2459: 2456: 2455: 2453: 2444: 2441: 2439: 2436: 2434: 2431: 2429: 2426: 2424: 2421: 2420: 2416: 2401: 2397: 2393: 2389: 2386:; Paoli, J.; 2385: 2379: 2376: 2371: 2367: 2363: 2359: 2353: 2350: 2345: 2341: 2337: 2333: 2327: 2324: 2319: 2315: 2311: 2307: 2301: 2298: 2293: 2289: 2285: 2281: 2275: 2272: 2267: 2263: 2259: 2255: 2249: 2246: 2241: 2239: 2234: 2228: 2225: 2220: 2216: 2212: 2208: 2202: 2200: 2196: 2191: 2187: 2183: 2179: 2173: 2170: 2165: 2161: 2157: 2153: 2147: 2144: 2139: 2135: 2131: 2127: 2121: 2118: 2113: 2109: 2105: 2101: 2095: 2092: 2087: 2083: 2079: 2075: 2069: 2067: 2063: 2058: 2054: 2050: 2046: 2040: 2037: 2032: 2028: 2024: 2020: 2014: 2011: 2006: 2002: 1998: 1994: 1988: 1986: 1982: 1977: 1973: 1969: 1965: 1959: 1956: 1951: 1947: 1943: 1939: 1933: 1930: 1925: 1921: 1917: 1913: 1907: 1904: 1899: 1895: 1891: 1885: 1883: 1881: 1877: 1872: 1868: 1864: 1860: 1854: 1851: 1846: 1842: 1838: 1834: 1828: 1826: 1822: 1817: 1813: 1809: 1805: 1799: 1797: 1793: 1788: 1784: 1780: 1776: 1770: 1768: 1766: 1764: 1762: 1760: 1756: 1751: 1747: 1741: 1739: 1737: 1733: 1728: 1724: 1718: 1715: 1710: 1706: 1700: 1697: 1692: 1686: 1683: 1672: 1668: 1664: 1660: 1657:; Paoli, J.; 1656: 1650: 1647: 1635: 1631: 1627: 1621: 1619: 1617: 1613: 1608: 1602: 1599: 1588: 1584: 1580: 1576: 1572: 1568: 1564: 1557: 1554: 1547: 1543: 1540: 1538: 1537:Language code 1535: 1533: 1530: 1527: 1524: 1523: 1519: 1517: 1515: 1489: 1486: 1478: 1477: 1473: 1470: 1462: 1461: 1457: 1454: 1446: 1445: 1441: 1438: 1430: 1429: 1425: 1423: 1420: 1412: 1411: 1408: 1406: 1398: 1396: 1394: 1390: 1386: 1382: 1376: 1374: 1370: 1326: 1320: 1313: 1311: 1307: 1302: 1299: 1295: 1294:email clients 1291: 1286: 1284: 1280: 1276: 1272: 1268: 1264: 1260: 1256: 1247: 1242: 1241: 1240: 1233: 1228: 1227: 1226: 1224: 1221: 1217: 1213: 1205: 1203: 1201: 1197: 1196:HTML entities 1193: 1189: 1185: 1181: 1177: 1173: 1167: 1163: 1155: 1150: 1147: 1145: 1142: 1140: 1137: 1135: 1132: 1131: 1129: 1127: 1123: 1119: 1115: 1107: 1104: 1102: 1099: 1097: 1094: 1092: 1089: 1087: 1084: 1082: 1079: 1078: 1076: 1067: 1061: 1058: 1045: 1038: 1035: 1031: 1027: 1013: 1010: 1004: 1001: 994: 991: 989: 986: 984: 981: 979: 976: 974: 971: 969: 965: 962: 960: 957: 955: 952: 950: 947: 945: 942: 940: 937: 935: 932: 930: 924: 922: 919: 917: 914: 912: 909: 907: 904: 902: 901:Code page 866 899: 898: 896: 887: 881: 878: 874: 870: 865: 858: 855: 851: 845: 842: 838: 834: 828: 825: 821: 817: 813: 807: 804: 800: 796: 792: 786: 783: 779: 774: 771: 767: 764:(U+20AC; see 763: 759: 753: 750: 745: 738: 735: 730: 724: 717: 714: 709: 703: 696: 693: 685: 683: 680: 678: 675: 673: 670: 668: 665: 663: 660: 658: 655: 653: 650: 648: 645: 643: 640: 638: 635: 633: 630: 628: 625: 623: 620: 618: 615: 613: 610: 608: 605: 603: 600: 598: 595: 593: 590: 589: 587: 584: 583:exclusively. 582: 577: 573: 565: 563: 559: 557: 556:byte-oriented 553: 549: 545: 540: 538: 534: 530: 522: 519: 516: 512: 509: 506: 505: 504: 502: 494: 492: 490: 486: 481: 468: 466: 462: 436: 434: 399: 392: 389: 384: 380: 368: 362: 354: 349: 341: 339: 335: 331: 327: 323: 311: 306: 304: 299: 297: 292: 291: 289: 288: 283: 280: 278: 275: 274: 273: 272: 267: 262: 259: 257: 254: 252: 249: 247: 244: 240: 237: 236: 235: 232: 228: 225: 223: 220: 218: 215: 214: 213: 210: 206: 203: 202: 201: 198: 196: 193: 191: 188: 184: 181: 180: 179: 176: 174: 171: 169: 166: 164: 163:Language code 161: 157: 154: 152: 149: 148: 147: 144: 142: 139: 137: 134: 130: 129:alt attribute 127: 126: 125: 122: 118: 115: 113: 110: 108: 105: 103: 100: 99: 98: 95: 91: 88: 86: 83: 82: 81: 78: 74: 71: 69: 66: 64: 61: 59: 56: 55: 54: 51: 49: 46: 45: 44: 43: 40: 36: 33: 29: 22: 2403:, retrieved 2395: 2378: 2365: 2352: 2339: 2326: 2313: 2300: 2287: 2274: 2261: 2248: 2236: 2227: 2214: 2185: 2172: 2159: 2146: 2133: 2120: 2107: 2094: 2081: 2052: 2039: 2026: 2013: 2000: 1971: 1958: 1945: 1932: 1919: 1906: 1897: 1866: 1853: 1840: 1811: 1782: 1749: 1726: 1717: 1708: 1699: 1685: 1674:, retrieved 1666: 1649: 1638:, retrieved 1629: 1601: 1590:, retrieved 1566: 1556: 1508:rather than 1498:&eacute; 1495: 1402: 1377: 1329:&lambda; 1324: 1318: 1314: 1303: 1298:text editors 1290:web browsers 1287: 1282: 1278: 1274: 1270: 1262: 1254: 1252: 1245: 1238: 1231: 1222: 1211: 1209: 1195: 1191: 1187: 1175: 1171: 1169: 1111: 1074: 1060: 1037: 1012: 1003: 978:Windows-1253 973:Mac OS Roman 894: 880: 857: 844: 827: 806: 785: 773: 752: 737: 716: 695: 647:Windows-1258 642:Windows-1257 637:Windows-1256 632:Windows-1255 627:Windows-1254 622:Windows-1252 617:Windows-1251 612:Windows-1250 585: 569: 560: 541: 536: 526: 498: 476: 459: 431: 393: 390: 376: 365:Content-Type 350: 347: 319: 178:Style sheets 145: 107:div and span 97:HTML element 48:Dynamic HTML 32: 2258:"10.1. GBK" 1267:hexadecimal 1184:hexadecimal 1139:ISO-2022-CN 1134:ISO-2022-KR 1122:ISO-2022-JP 954:ISO-8859-16 949:ISO-8859-15 944:ISO-8859-14 939:ISO-8859-13 934:ISO-8859-10 926:ISO-8859-8- 799:Windows-31J 766:Windows-936 708:ISO-8859-11 667:ISO-2022-JP 607:Windows-874 351:First, the 269:Comparisons 256:Web storage 251:Quirks mode 190:Font family 141:HTML editor 2452:Categories 1900:. docs.rs. 1548:References 1510:&#XA1b 1506:&#xA1b 1487:apostrophe 1480:&apos; 1464:&quot; 1393:whitespace 1365:&apos; 1341:&quot; 1269:form. The 1261:form, and 1223:code point 1149:HZ-GB-2312 1066:JIS X 0212 921:ISO-8859-6 916:ISO-8859-5 911:ISO-8859-4 906:ISO-8859-3 744:ISO-8859-9 729:ISO-8859-1 602:ISO-8859-8 597:ISO-8859-7 592:ISO-8859-2 409:http-equiv 353:web server 326:characters 200:JavaScript 195:Web colors 136:HTML frame 1752:. WHATWG. 1422:ampersand 1414:&amp; 1345:&amp; 831:Actually 816:Shift Out 762:euro sign 662:Shift JIS 537:incorrect 359:" in the 340:display. 334:integrity 239:Validator 2384:Bray, T. 2240:Bugzilla 1655:Bray, T. 1587:14399078 1520:See also 1448:&gt; 1432:&lt; 1337:&gt; 1333:&lt; 1288:Not all 1049:GB 18030 820:Shift In 682:UTF-16LE 677:UTF-16BE 652:GB 18030 489:UTF-16LE 485:UTF-16BE 2405:8 March 1676:8 March 1592:30 July 1490:U+0027 1474:U+0022 1458:U+003E 1442:U+003C 1426:U+0026 1381:Unicode 1259:decimal 1244:&#x 1220:Unicode 1180:decimal 1053:GB 2312 1026:KOI8-RU 1022:KOI8-RU 968:KOI8-RU 702:TIS-620 529:English 446:charset 418:content 363:(HTTP) 357:charset 338:browser 156:Unicode 117:marquee 58:article 2370:WHATWG 2344:WHATWG 2318:WHATWG 2292:WHATWG 2266:WHATWG 2219:WHATWG 2190:WHATWG 2164:WHATWG 2138:WHATWG 2112:WHATWG 2086:WHATWG 2057:WHATWG 2031:WHATWG 2005:WHATWG 1976:WHATWG 1950:WHATWG 1924:WHATWG 1871:WHATWG 1845:WHATWG 1816:WHATWG 1787:WHATWG 1729:. W3C. 1711:. W3C. 1640:28 May 1585:  1323:where 1253:where 1230:&# 1126:UTF-16 1106:UTF-32 1101:EBCDIC 1091:BOCU-1 1081:CESU-8 1044:GB2312 1018:KOI8-U 993:EUC-JP 964:KOI8-U 959:KOI8-R 864:UTF-16 839:block. 672:EUC-KR 572:WHATWG 552:UTF-32 548:UTF-16 383:module 246:WHATWG 222:WebGPU 68:canvas 1630:HTML5 1583:S2CID 1514:XHTML 1418:& 1385:UTF-8 1369:HTML5 1361:& 1317:& 1306:ASCII 1194:, or 1186:) or 1086:UTF-7 873:UTF-8 850:UTF-8 723:ASCII 581:UTF-8 544:UTF-8 501:UTF-8 461:XHTML 433:HTML5 330:ASCII 227:WebXR 217:WebGL 212:Web3D 205:WebCL 112:blink 85:Basic 80:XHTML 73:video 63:audio 53:HTML5 2458:HTML 2407:2010 1678:2010 1642:2018 1594:2014 1571:IETF 1452:> 1436:< 1359:and 1353:> 1349:< 1343:and 1325:name 1319:name 1283:hhhh 1279:hhhh 1275:nnnn 1263:hhhh 1255:nnnn 1246:hhhh 1232:nnnn 1200:SGML 1164:and 1124:and 1096:SCSU 1020:and 818:and 793:and 657:Big5 570:The 487:and 455:> 443:meta 440:< 427:> 406:meta 403:< 396:head 322:HTML 102:meta 39:HTML 2400:W3C 2396:XML 2238:W3C 1671:W3C 1667:XML 1575:doi 1405:XML 1292:or 1277:or 1239:or 1182:or 1030:Ў/ў 988:GBK 795:NEC 791:IBM 576:W3C 550:or 533:CJK 465:XML 234:W3C 183:CSS 2454:: 2398:, 2394:, 2368:. 2364:. 2360:. 2342:. 2338:. 2334:. 2316:. 2312:. 2308:. 2290:. 2286:. 2282:. 2264:. 2260:. 2256:. 2235:. 2217:. 2213:. 2209:. 2198:^ 2188:. 2184:. 2180:. 2162:. 2158:. 2154:. 2136:. 2132:. 2128:. 2110:. 2106:. 2102:. 2084:. 2080:. 2076:. 2065:^ 2055:. 2051:. 2047:. 2029:. 2025:. 2021:. 2003:. 1999:. 1995:. 1984:^ 1974:. 1970:. 1966:. 1948:. 1944:. 1940:. 1922:. 1918:. 1914:. 1896:. 1892:. 1879:^ 1869:. 1865:. 1861:. 1843:. 1839:. 1835:. 1824:^ 1814:. 1810:. 1806:. 1795:^ 1785:. 1781:. 1777:. 1758:^ 1748:. 1735:^ 1725:. 1707:. 1669:, 1665:, 1632:, 1628:, 1615:^ 1581:, 1573:, 1569:, 1512:. 1375:. 1355:, 1351:, 1339:, 1335:, 1210:A 1202:. 966:/ 726:, 705:, 513:A 388:. 2372:. 2346:. 2320:. 2294:. 2268:. 2221:. 2192:. 2166:. 2140:. 2114:. 2088:. 2059:. 2033:. 2007:. 1978:. 1952:. 1926:. 1873:. 1847:. 1818:. 1789:. 1693:. 1609:. 1577:: 1502:x 1484:' 1468:" 1357:" 1321:; 1271:x 1248:; 1234:; 1218:/ 1178:( 928:I 875:. 852:. 801:. 449:= 421:= 412:= 309:e 302:t 295:v 30:. 23:.

Index

List of XML and HTML character entity references
Help:Percent-encoding § Fixing Links with Unsupported Characters
HTML
Dynamic HTML
HTML5
article
audio
canvas
video
XHTML
Basic
Mobile Profile
HTML element
meta
div and span
blink
marquee
HTML attribute
alt attribute
HTML frame
HTML editor
Character encodings
named characters
Unicode
Language code
Document Object Model
Browser Object Model
Style sheets
CSS
Font family

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.