Knowledge (XXG)

Plain text

Source đź“ť

419: 36: 528:
mention characters used in Greek, Russian, and most Eastern languages). Many individuals, companies, and countries defined extra characters as needed—often reassigning control characters, or using values in the range from 128 to 255. Using values above 128 conflicts with using the 8th bit as a checksum, but the checksum usage gradually died out.
189:"readable" content (or just files with nothing that the speaker does not prefer). For example, that could exclude any indication of fonts or layout (such as markup, markdown, or even tabs); characters such as curly quotes, non-breaking spaces, soft hyphens, em dashes, and/or ligatures; or other things. 657:
sets, the first 32 characters of the "upper half" (128 to 159) are also control codes, known as the "C1 set". They are rarely used directly; when they turn up in documents which are ostensibly in an ISO 8859 encoding, their code positions generally refer instead to the characters at that position in
527:
The near-ubiquity of ASCII was a great help, but failed to address international and linguistic concerns. The dollar-sign ("$ ") was not as useful in England, and the accented characters used in Spanish, French, German, Portuguese, and many other languages were entirely unavailable in ASCII (not to
578:
Text is considered plain text regardless of its encoding. To properly understand or process it the recipient must know (or be able to figure out) what encoding was used; however, they need not know anything about the computer architecture that was used, or about the binary structures defined by
224:
a character, is a binary file. Converting a plain text file to a different character encoding does not change the meaning of the text, as long as the correct character encoding is used. However, converting a binary file to a different format may alter the interpretation of the non-textual data.
485:
Before the early 1960s, computers were mainly used for number-crunching rather than for text, and memory was extremely expensive. Computers often allocated only 6 bits for each character, permitting only 64 characters—assigning codes for A-Z, a-z, and 0-9 would leave only 2 codes: nowhere near
133: 559:
then provided conventions for "switching" between different character sets in mid-file. Many other organisations developed variations on these, and for many years Windows and Macintosh computers used incompatible variations.
297:
The use of plain text rather than binary files enables files to survive much better "in the wild", in part by making them largely immune to computer architecture incompatibilities. For example, all the problems of
598:; charset=UTF-8" -- plain text represented using the UTF-8 character encoding with HTML markup. Another common MIME type is "application/json" -- plain text represented using the UTF-8 character encoding with 215:
Plain text is also sometimes used only to exclude "binary" files: those in which at least some parts of the file cannot be correctly interpreted via the character encoding in effect. For example, a file or
174:, images, etc.). It may also include a limited number of "whitespace" characters that affect simple arrangement of text, such as spaces, line breaks, or tabulation characters. Plain text is different from 571:
currently allows for 1,114,112 code values, and assigns codes covering nearly all modern text writing systems, as well as many historical ones, and for many non-linguistic characters such as printer's
531:
These additional characters were encoded differently in different countries, making texts impossible to decode without figuring out the originator's rules. For instance, a browser might display
520:, and values from 32 to 127 for graphic characters such as letters, digits, and punctuation. Most machines stored characters in 8 bits rather than 7, ignoring the remaining bit or using it as a 290:, and TeX, as well as nearly all programming language source code files, are considered plain text. The particular content is irrelevant to whether a file is plain text. For example, an 555:) is also known as "Latin-1", and covers the needs of most (not all) European languages that use Latin-based characters (there was not quite enough room to cover them all). 270:
are examples of rich text fully represented as plain text streams, interspersing plain text data with sequences of characters that represent the additional data structures."
567:
to develop a single, unified character encoding that could cover all known (or at least all currently known) languages. After some conflict, these efforts were unified.
178:, where style information is included; from structured text, where structural parts of the document such as paragraphs, sections, and the like are identified; and from 540: 926: 248:, is any text representation containing plain text plus added information such as a language identifier, font size, color, hypertext links, and so on. 1297: 931: 921: 916: 314:
The purpose of using plain text today is primarily independence from programs that require their very own special encoding or formatting or
904: 805: 682:
direction override characters (used to explicitly mark right-to-left writing inside left-to-right writing and the other way around) and
466: 119: 630:
known as the "C0 set": codes originally intended not to represent printable information, but rather to control devices (such as
1055: 306:
rather than UTF-8, endianness matters, but uniformly for every character, rather than for potentially-unknown subsets of it).
1172: 977: 909: 871: 444: 382: 57: 237:"Plain text is a pure sequence of character codes; plain Un-encoded text is therefore a sequence of Unicode character codes. 498:, and others had to resort to conventions such as keying an asterisk preceding letters actually intended to be upper-case. 303: 100: 1072: 1002: 850: 508:
argued strongly for going to 8-bit bytes, because someday people might want to process text, and won. Although IBM used
72: 1327: 440: 53: 429: 1348: 1082: 950: 670: 448: 433: 332:
Many other computer programs are also capable of processing or creating plain text, such as countless programs in
79: 46: 1260: 1212: 1124: 1102: 1097: 1025: 891: 217: 1134: 798: 770: 291: 605:
When a document is received without any explicit indication of the character encoding, some applications use
182:
in which some portions must be interpreted as binary objects (encoded integers, real numbers, images, etc.).
1287: 1202: 733: 618: 86: 539:
if it tried to interpret one character set as another. The International Organization for Standardization (
1030: 886: 845: 840: 753: 683: 326: 171: 1353: 1020: 995: 167: 24: 68: 363:
Plain text files are almost universal in programming; a source code file containing instructions in a
822: 364: 220:
consisting of "hello" (in any encoding), following by 4 bytes that express a binary integer that is
1292: 1270: 1197: 1050: 1042: 962: 791: 728: 679: 638:
about data streams such as those stored on magnetic tape. They include common characters like the
486:
enough. Most computers opted not to support lower-case letters. Thus, early text projects such as
1275: 1255: 1207: 1182: 967: 936: 564: 517: 480: 368: 349: 193: 582:
Perhaps the most common way of explicitly stating the specific encoding of plain text is with a
563:
The text-encoding situation became more and more complex, leading to efforts by ISO and by the
329:
allows people to give commands in plain text and get a response, also typically in plain text.
1162: 1092: 1067: 881: 876: 627: 606: 491: 353: 337: 1307: 1192: 990: 631: 255: 1312: 1177: 1129: 1062: 713: 275: 594:" -- plain text without markup. Another MIME type often used in both email and HTTP is " 93: 1265: 1087: 1077: 985: 687: 650: 341: 286:
form (as in HTML, XML, and so on). Thus, representations such as SGML, RTF, HTML, XML,
283: 175: 1342: 1187: 643: 397: 393:
generally contains only plain text (without formatting) intended for humans to read.
16:
Term for computer data consisting only of unformatted characters of readable material
1144: 1119: 663: 659: 654: 495: 487: 396:
The best format for storing knowledge persistently is plain text, rather than some
179: 774: 282:
are generally considered plain text, so long as the markup is also in a directly
1322: 1317: 1167: 1114: 941: 718: 708: 703: 501: 418: 319: 315: 287: 141: 35: 294:
file can express drawings or even bitmapped graphics, but is still plain text.
1227: 1222: 1139: 1107: 955: 591: 552: 544: 390: 299: 146: 170:
of readable material but not its graphical representation nor other objects (
1302: 1280: 1237: 1232: 899: 855: 814: 723: 595: 583: 279: 159: 20: 1217: 635: 556: 548: 521: 367:
is almost always a plain text file. Plain text is also commonly used for
675: 666:, that use the codes to instead provide additional graphic characters. 639: 572: 568: 201: 132: 509: 375: 357: 209: 185:
The term is sometimes used quite loosely, to mean files that contain
318:. Plain text files can be opened, read, and edited with ubiquitous 691: 623: 513: 205: 197: 166:
is a loose term for data (e.g. file contents) that represent only
151: 131: 371:, which are read for saved settings at the startup of a program. 835: 599: 587: 386: 345: 259: 251: 787: 830: 505: 412: 333: 267: 263: 29: 783: 348:
and its kin; as well as web browsers (a few browsers such as
274:
According to other definitions, however, files that contain
551:, to accommodate various languages. The first of these ( 626:
reserves the first 32 codes (numbers 0–31 decimal) for
1246: 1155: 1041: 1011: 976: 864: 821: 60:. Unsourced material may be challenged and removed. 678:defines additional control characters, including 658:a proprietary, system-specific encoding, such as 212:become more common, that usage may be shrinking. 516:, using values from 0 to 31 for (non-printing) 512:, most text from then on came to be encoded in 356:produce only plain text for display) and other 196:, but occasionally the term is taken to imply 799: 8: 609:to attempt to guess what encoding was used. 579:whatever program (if any) created the data. 447:. Unsourced material may be challenged and 806: 792: 784: 467:Learn how and when to remove this message 120:Learn how and when to remove this message 634:) that make use of ASCII, or to provide 745: 302:can be avoided (with encodings such as 192:In principle, plain text can be in any 775:Chapter 14: "The Power of Plain Text" 7: 754:"The Unicode Standard, version 14.0" 445:adding citations to reliable sources 58:adding citations to reliable sources 233:According to The Unicode Standard: 19:For the cryptography meaning, see 14: 649:In 8-bit character sets such as 417: 34: 543:) eventually developed several 45:needs additional citations for 1: 872:Arbitrary-precision or bignum 686:to select alternate forms of 575:, mathematical symbols, etc. 769:Andrew Hunt, David Thomas. " 590:, the default MIME type is " 374:Plain text is used for much 144:, displayed by the command 1370: 671:Unicode control characters 668: 616: 478: 136:Text file with portion of 18: 1213:Strongly typed identifier 204:-based encodings such as 138:The Human Side of Animals 771:The Pragmatic Programmer 229:Plain text and rich text 1288:Parametric polymorphism 734:Line wrap and word wrap 619:C0 and C1 control codes 694:and other characters. 327:command-line interface 172:floating-point numbers 155: 23:. For other uses, see 135: 25:Text (disambiguation) 441:improve this section 365:programming language 54:improve this article 1293:Primitive data type 1198:Recursive data type 1051:Algebraic data type 927:Quadruple precision 729:Text-based protocol 684:variation selectors 680:bi-directional text 409:Character encodings 369:configuration files 1256:Abstract data type 937:Extended precision 896:Reduced precision 628:control characters 565:Unicode Consortium 518:control characters 481:Character encoding 156: 1349:Text file formats 1336: 1335: 1068:Associative array 932:Octuple precision 759:. pp. 18–19. 607:charset detection 492:Index Thomisticus 477: 476: 469: 354:Line Mode Browser 130: 129: 122: 104: 1361: 1308:Type constructor 1193:Opaque data type 1125:Record or Struct 922:Double precision 917:Single precision 808: 801: 794: 785: 778: 767: 761: 760: 758: 750: 636:meta-information 586:. For email and 472: 465: 461: 458: 452: 421: 413: 244:, also known as 149: 125: 118: 114: 111: 105: 103: 62: 38: 30: 1369: 1368: 1364: 1363: 1362: 1360: 1359: 1358: 1339: 1338: 1337: 1332: 1313:Type conversion 1248: 1242: 1178:Enumerated type 1151: 1037: 1031:null-terminated 1007: 972: 860: 817: 812: 782: 781: 768: 764: 756: 752: 751: 747: 742: 714:Binary protocol 700: 673: 621: 615: 483: 473: 462: 456: 453: 438: 422: 411: 406: 322:and utilities. 312: 231: 145: 126: 115: 109: 106: 63: 61: 51: 39: 28: 17: 12: 11: 5: 1367: 1365: 1357: 1356: 1351: 1341: 1340: 1334: 1333: 1331: 1330: 1325: 1320: 1315: 1310: 1305: 1300: 1295: 1290: 1285: 1284: 1283: 1273: 1268: 1266:Data structure 1263: 1258: 1252: 1250: 1244: 1243: 1241: 1240: 1235: 1230: 1225: 1220: 1215: 1210: 1205: 1200: 1195: 1190: 1185: 1180: 1175: 1170: 1165: 1159: 1157: 1153: 1152: 1150: 1149: 1148: 1147: 1137: 1132: 1127: 1122: 1117: 1112: 1111: 1110: 1100: 1095: 1090: 1085: 1080: 1075: 1070: 1065: 1060: 1059: 1058: 1047: 1045: 1039: 1038: 1036: 1035: 1034: 1033: 1023: 1017: 1015: 1009: 1008: 1006: 1005: 1000: 999: 998: 993: 982: 980: 974: 973: 971: 970: 965: 960: 959: 958: 948: 947: 946: 945: 944: 934: 929: 924: 919: 914: 913: 912: 907: 905:Half precision 902: 892:Floating point 889: 884: 879: 874: 868: 866: 862: 861: 859: 858: 853: 848: 843: 838: 833: 827: 825: 819: 818: 813: 811: 810: 803: 796: 788: 780: 779: 762: 744: 743: 741: 738: 737: 736: 731: 726: 721: 716: 711: 706: 699: 696: 688:CJK ideographs 669:Main article: 653:and the other 617:Main article: 614: 611: 479:Main article: 475: 474: 425: 423: 416: 410: 407: 405: 402: 342:classic Mac OS 311: 308: 284:human-readable 272: 271: 249: 238: 230: 227: 176:formatted text 128: 127: 42: 40: 33: 15: 13: 10: 9: 6: 4: 3: 2: 1366: 1355: 1352: 1350: 1347: 1346: 1344: 1329: 1326: 1324: 1321: 1319: 1316: 1314: 1311: 1309: 1306: 1304: 1301: 1299: 1296: 1294: 1291: 1289: 1286: 1282: 1279: 1278: 1277: 1274: 1272: 1269: 1267: 1264: 1262: 1259: 1257: 1254: 1253: 1251: 1245: 1239: 1236: 1234: 1231: 1229: 1226: 1224: 1221: 1219: 1216: 1214: 1211: 1209: 1206: 1204: 1201: 1199: 1196: 1194: 1191: 1189: 1188:Function type 1186: 1184: 1181: 1179: 1176: 1174: 1171: 1169: 1166: 1164: 1161: 1160: 1158: 1154: 1146: 1143: 1142: 1141: 1138: 1136: 1133: 1131: 1128: 1126: 1123: 1121: 1118: 1116: 1113: 1109: 1106: 1105: 1104: 1101: 1099: 1096: 1094: 1091: 1089: 1086: 1084: 1081: 1079: 1076: 1074: 1071: 1069: 1066: 1064: 1061: 1057: 1054: 1053: 1052: 1049: 1048: 1046: 1044: 1040: 1032: 1029: 1028: 1027: 1024: 1022: 1019: 1018: 1016: 1014: 1010: 1004: 1001: 997: 994: 992: 989: 988: 987: 984: 983: 981: 979: 975: 969: 966: 964: 961: 957: 954: 953: 952: 949: 943: 940: 939: 938: 935: 933: 930: 928: 925: 923: 920: 918: 915: 911: 908: 906: 903: 901: 898: 897: 895: 894: 893: 890: 888: 885: 883: 880: 878: 875: 873: 870: 869: 867: 863: 857: 854: 852: 849: 847: 844: 842: 839: 837: 834: 832: 829: 828: 826: 824: 823:Uninterpreted 820: 816: 809: 804: 802: 797: 795: 790: 789: 786: 776: 772: 766: 763: 755: 749: 746: 739: 735: 732: 730: 727: 725: 722: 720: 717: 715: 712: 710: 707: 705: 702: 701: 697: 695: 693: 689: 685: 681: 677: 672: 667: 665: 661: 656: 652: 647: 645: 644:tab character 641: 637: 633: 629: 625: 620: 613:Control codes 612: 610: 608: 603: 601: 597: 593: 589: 585: 580: 576: 574: 570: 566: 561: 558: 554: 550: 546: 542: 538: 534: 529: 525: 523: 519: 515: 511: 507: 503: 499: 497: 493: 489: 482: 471: 468: 460: 457:December 2023 450: 446: 442: 436: 435: 431: 426:This section 424: 420: 415: 414: 408: 403: 401: 399: 398:binary format 394: 392: 389:" file, or a 388: 384: 379: 377: 372: 370: 366: 361: 359: 355: 351: 347: 343: 339: 335: 330: 328: 323: 321: 317: 309: 307: 305: 301: 295: 293: 289: 285: 281: 277: 269: 265: 261: 257: 253: 250: 247: 243: 240:In contrast, 239: 236: 235: 234: 228: 226: 223: 219: 213: 211: 207: 203: 199: 195: 190: 188: 183: 181: 177: 173: 169: 165: 161: 153: 148: 143: 139: 134: 124: 121: 113: 102: 99: 95: 92: 88: 85: 81: 78: 74: 71: â€“  70: 66: 65:Find sources: 59: 55: 49: 48: 43:This article 41: 37: 32: 31: 26: 22: 1354:Open formats 1093:Intersection 1012: 765: 748: 674: 664:Mac OS Roman 660:Windows-1252 648: 622: 604: 581: 577: 562: 536: 535:rather than 532: 530: 526: 500: 496:Brown Corpus 488:Roberto Busa 484: 463: 454: 439:Please help 427: 395: 380: 373: 362: 331: 324: 320:text editors 313: 296: 273: 245: 241: 232: 221: 214: 191: 186: 184: 180:binary files 163: 157: 137: 116: 107: 97: 90: 83: 76: 69:"Plain text" 64: 52:Please help 47:verification 44: 1323:Type theory 1318:Type system 1168:Bottom type 1115:Option type 1056:generalized 942:Long double 887:Fixed point 719:Source code 709:Binary file 704:Binary data 502:Fred Brooks 316:file format 288:wiki markup 242:styled text 142:Royal Dixon 110:August 2012 1343:Categories 1228:Empty type 1223:Type class 1173:Collection 1130:Refinement 1108:metaobject 956:signedness 815:Data types 740:References 592:text/plain 553:ISO 8859-1 545:code pages 391:TXT Record 300:Endianness 168:characters 164:plain text 80:newspapers 1303:Subtyping 1298:Interface 1281:metaclass 1233:Unit type 1203:Semaphore 1183:Exception 1088:Inductive 1078:Dependent 1043:Composite 1021:Character 1003:Reference 900:Minifloat 856:Bit array 773:". 1999. 724:Text file 596:text/html 584:MIME type 428:does not 360:readers. 280:meta-data 278:or other 246:rich text 160:computing 21:Plaintext 1328:Variable 1218:Top type 1083:Equality 991:physical 968:Rational 963:Interval 910:bfloat16 777:. p. 73. 698:See also 655:ISO 8859 642:and the 632:printers 602:markup. 573:dingbats 557:ISO 2022 549:ISO 8859 522:checksum 404:Encoding 352:and the 194:encoding 1271:Generic 1247:Related 1163:Boolean 1120:Product 996:virtual 986:Address 978:Pointer 951:Integer 882:Decimal 877:Complex 865:Numeric 676:Unicode 651:Latin-1 640:newline 569:Unicode 449:removed 434:sources 383:comment 338:Windows 202:Unicode 94:scholar 1261:Boxing 1249:topics 1208:Stream 1145:tagged 1103:Object 1026:String 547:under 510:EBCDIC 494:, the 376:e-mail 358:e-text 344:, and 276:markup 266:, and 218:string 210:UTF-16 154:window 150:in an 96:  89:  82:  75:  67:  1156:Other 1140:Union 1073:Class 1063:Array 846:Tryte 757:(PDF) 692:emoji 624:ASCII 514:ASCII 385:, a " 310:Usage 304:UCS-2 206:UTF-8 200:. As 198:ASCII 152:xterm 101:JSTOR 87:books 1276:Kind 1238:Void 1098:List 1013:Text 851:Word 841:Trit 836:Byte 600:JSON 588:HTTP 432:any 430:cite 387:.txt 350:Lynx 346:Unix 260:HTML 252:SGML 208:and 187:only 73:news 1135:Set 831:Bit 662:or 541:ISO 506:IBM 504:of 490:'s 443:by 334:DOS 292:SVG 268:TeX 264:XML 256:RTF 222:not 158:In 147:cat 140:by 56:by 1345:: 690:, 646:. 533:¬A 524:. 400:. 381:A 378:. 340:, 336:, 325:A 262:, 258:, 254:, 162:, 807:e 800:t 793:v 537:` 470:) 464:( 459:) 455:( 451:. 437:. 123:) 117:( 112:) 108:( 98:· 91:· 84:· 77:· 50:. 27:.

Index

Plaintext
Text (disambiguation)

verification
improve this article
adding citations to reliable sources
"Plain text"
news
newspapers
books
scholar
JSTOR
Learn how and when to remove this message

Royal Dixon
cat
xterm
computing
characters
floating-point numbers
formatted text
binary files
encoding
ASCII
Unicode
UTF-8
UTF-16
string
SGML
RTF

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

↑