Knowledge (XXG)

Structured document

Source 📝

572:
which would no longer make sense if the rendering is not in sync with the prose. Similarly, a particular edition of a document may be of interest not only for its content but for its typographic practice as well, in which case describing that practice is not only desirable but necessary. This problem is not unique to document structure, however; it also arises in grammar when discussing grammar, and in many other cases.
22: 130:
is used to identify the whole and parts of the document as having various meanings beyond their formatting. For example, a structured document might identify a certain portion as a "chapter title" (or "code sample" or "quatrain") rather than as "Helvetica bold 24" or "indented Courier". Such portions
571:
In principle, just what constitutes "structure" vs. non-structure can vary. In a book specifically about typography, tagging something as "italic" or "bold" may well be the whole point. For example, a discussion of when to use particular styles will likely want to give examples and counter-examples,
143:
Structured documents generally support at least hierarchical structures, for example lists, not merely list items; sections, not merely section headings; and so on. This is in stark contrast to formatting-oriented systems. High-end systems also support multiple independent and/or overlapping sets of
516:
tag is used for another slightly different kind of structure, namely the interconnection or cross-reference structure, rather than the interval section division. This is most definitely structure, and in fact it is possible to create alternate markup for documents that expresses the same particular
139:
Structured documents generally focus on labeling things that can be used for a variety of processing purposes, not merely formatting. For example, explicit labeling of "chapter title" or "emphasis" is far more useful to systems for the visually impaired, than merely "Helvetica bold 24" or "italic".
271:
explicitly distinguished the many different meanings which attach to the print version's use of italics, search tools can retrieve entries based on etymology, quotations, and many other features of interest. When HTML provides structural rather than merely formatting information, visually impaired
266:
In writing structured documents the focus is on encoding the logical structure of a document, with less or even no explicit work devoted to its presentation to humans by printed pages or screens (in some cases, no such use is even expected). Structured documents can easily be processed by computer
493:
which directly expresses no meaning other than an instruction to a visual display (although an intelligent agent may be able to discern a structural meaning lurking behind the tag). The "strong" tag is "descriptive" or "structural" in that it is intended to label an abstract, quasi-linguistic
197:. However, HTML has not only tags for meaning-oriented components such as paragraph, title, and code; but also format-oriented ones such as italic, bold, and most table. In practice, HTML is sometimes used as a structured document system, but often used as a formatting language. 163:. A structured document which obeys the rules of the schema is commonly called "valid according to that schema". Some systems also support documents with component of arbitrary types and combinations, but still with syntactic rules for how those components are identified. 272:
users can be easily given a more useful reading interface. When travel companies provide itineraries as structured documents rather than just displays, user tools can easily extract the necessary facts and pass them on to calendar or other applications.
267:
systems to extract and present derivative forms of the document. In most Knowledge (XXG) articles for example, a table of contents is automatically generated from the different heading tags in the body of the document. Because the SGML conversion of the
459:
One of the most attractive features of structured documents is that they can be reused in many contexts and presented in various ways on mobile phones, TV screens, speech synthesisers, and any other device which can be programmed to process them.
563:. Different style sheets can be attached to any markup, semantic or presentational, to produce different presentations, although mapping an tag name "italic" to boldface presentation is not entirely intuitive. 468:
Other meaning can be ascribed to text which isn't "structural" in quite the same sense as larger objects, but is still considered "document structure" because it expresses claims about the scope and nature or
147:
Structured document systems commonly permit creating explicit rules defining component types and how they may be combined. Such a set of rules is called a "schema" by analogy with
140:
In the same way, meaningful labeling of the many items on a technical information sheet enables far better integration with databases, search systems, online catalogs, and so on.
261: 167: 39: 663: 105: 86: 58: 499:<abbr>, <acronym>, <address>, <cite>, <del>, <dfn>, <ins>, <kbd>, and <q> 43: 65: 704: 213: 209: 620: 72: 32: 658: 586: 268: 54: 556: 506: 205: 171: 200:
Many domains use structured documents via domain-specific schemas they have co-operatively developed, such as
481:
element means that the enclosed text is emphatic. In visual terms this commonly rendered via bold, just like
437:
is used to identify the whole and parts of the document as having various meanings beyond their formatting.
626: 494:
property of its content, rather than to describe the appropriate presentation in some particular medium.
551:
discourage such markup in favor of descriptive markup which is mapped to particular presentations via
552: 631: 123: 596: 591: 581: 175: 79: 170:(SGML) has pioneered the concept of structured documents", although earlier systems such as 548: 486: 257: 148: 127: 677: 224:
for Web interfaces, and countless more. All these cases use specific schemas based on
698: 616: 653: 518: 473:
of portions of a document, rather than instructions about its presentation. In the
521:
to represent section contents, rather than navigational hyperlink presentations).
182:
provided many structured-document features and capabilities, and SGML's offspring
527:
from early on has also had tags which express presentational semantics, such as
21: 160: 485:; but a speech interface would instead likely use voice inflection. The term 470: 156: 131:
in general are commonly called "components" or "elements" of a document.
279:
a part of the logical structure of a document may be the document body;
502: 240: 235:
is the universal format for structured documents and data on the Web
547:
or which had other effects on the presentation. Modern versions of
560: 179: 524: 474: 276: 221: 217: 201: 190: 189:
One very widely used representation for structured documents is
244: 232: 225: 194: 183: 152: 151:. Several formal languages exist for specifying them, such as 15: 654:"Multi-purpose publishing using HTML, XML, and CSS" 46:. Unsourced material may be challenged and removed. 230: 517:structures in either way (for example, using 8: 625:. Extreme Markup Languages 2004. Montréal. 497:Some other structural tags in HTML include 630: 106:Learn how and when to remove this message 555:, a method pioneered by systems such as 193:, a schema defined and described by the 608: 652:Håkon Wium Lie; Janne Saarela (1998). 262:Separation of content and presentation 7: 622:Markup Overlap: A Review and a Horse 283:, containing a first level heading; 168:Standard Generalized Markup Language 44:adding citations to reliable sources 664:Association for Computing Machinery 14: 375:"/Electronic_document" 20: 384:"Electronic document" 31:needs additional citations for 1: 509:have far larger selections. 413:"/Markup_language" 422:"Markup language" 166:Lie and Saarela noted the " 721: 255: 220:for spacecraft telemetry, 216:for business interchange, 587:Machine-readable document 269:Oxford English Dictionary 678:"A sample HTML instance" 501:. Other schemas such as 293: 208:for literary documents, 204:for journal publishing, 249: 489:excludes markup like 399:where some method of 126:where some method of 55:"Structured document" 705:Electronic documents 477:fragment above, the 346:"selflink" 252:Structural semantics 241:XHTML2 Working Group 40:improve this article 389:electronic document 351:structured document 313:Structured document 287:, and a paragraph; 124:electronic document 120:structured document 597:Structured writing 592:Overlapping markup 582:Document processor 567:Context and intent 546: 116: 115: 108: 90: 712: 689: 688: 686: 684: 674: 668: 667: 649: 643: 642: 640: 639: 634: 613: 549:markup languages 544: 542: 534: 515: 500: 492: 484: 480: 455: 452: 449: 446: 443: 440: 436: 433: 430: 426: 423: 420: 417: 414: 411: 408: 405: 402: 398: 395: 392: 388: 385: 382: 379: 376: 373: 370: 367: 364: 360: 357: 354: 350: 347: 344: 341: 338: 335: 331: 328: 325: 322: 319: 316: 312: 309: 306: 303: 300: 297: 290: 286: 282: 247: 186:is now favored. 149:database schemas 111: 104: 100: 97: 91: 89: 48: 24: 16: 720: 719: 715: 714: 713: 711: 710: 709: 695: 694: 693: 692: 682: 680: 676: 675: 671: 651: 650: 646: 637: 635: 632:10.1.1.108.9959 615: 614: 610: 605: 578: 569: 543:), or to alter 540: 532: 513: 498: 490: 487:semantic markup 482: 478: 466: 464:Other semantics 457: 456: 453: 450: 447: 444: 441: 438: 434: 431: 428: 424: 421: 418: 415: 412: 409: 406: 403: 400: 396: 393: 390: 386: 383: 380: 377: 374: 371: 368: 365: 362: 358: 355: 352: 348: 345: 342: 339: 336: 333: 329: 326: 323: 320: 317: 314: 310: 307: 304: 301: 298: 295: 288: 284: 280: 264: 258:Semantic markup 254: 248: 239: 137: 112: 101: 95: 92: 49: 47: 37: 25: 12: 11: 5: 718: 716: 708: 707: 697: 696: 691: 690: 669: 644: 617:DeRose, Steven 607: 606: 604: 601: 600: 599: 594: 589: 584: 577: 574: 568: 565: 479:<strong> 465: 462: 294: 253: 250: 237: 136: 133: 114: 113: 28: 26: 19: 13: 10: 9: 6: 4: 3: 2: 717: 706: 703: 702: 700: 679: 673: 670: 665: 661: 660: 655: 648: 645: 633: 628: 624: 623: 618: 612: 609: 602: 598: 595: 593: 590: 588: 585: 583: 580: 579: 575: 573: 566: 564: 562: 558: 554: 550: 538: 530: 526: 522: 520: 510: 508: 504: 495: 488: 476: 472: 463: 461: 292: 278: 273: 270: 263: 259: 251: 246: 242: 236: 234: 229: 227: 223: 219: 215: 211: 207: 203: 198: 196: 192: 187: 185: 181: 177: 173: 169: 164: 162: 158: 154: 150: 145: 141: 134: 132: 129: 125: 121: 110: 107: 99: 88: 85: 81: 78: 74: 71: 67: 64: 60: 57: –  56: 52: 51:Find sources: 45: 41: 35: 34: 29:This article 27: 23: 18: 17: 681:. Retrieved 672: 657: 647: 636:. Retrieved 621: 611: 570: 553:style sheets 536: 528: 523: 519:transclusion 511: 496: 467: 458: 281:<body> 274: 265: 231: 199: 188: 165: 146: 144:components. 142: 138: 119: 117: 102: 93: 83: 76: 69: 62: 50: 38:Please help 33:verification 30: 512:The anchor 638:2014-10-14 603:References 545:font sizes 285:<h1> 256:See also: 161:Schematron 96:April 2014 66:newspapers 627:CiteSeerX 541:<i> 533:<b> 514:<a> 491:<b> 483:<b> 289:<p> 699:Category 619:(2004). 576:See also 471:ontology 238:—  157:Relax NG 135:Overview 683:5 March 503:DocBook 176:Augment 80:scholar 659:W3.org 629:  557:Scribe 537:italic 427:markup 361:is an 356:strong 337:strong 178:, and 172:Scribe 159:, and 128:markup 122:is an 82:  75:  68:  61:  53:  561:FRESS 535:) or 448:</ 439:</ 429:</ 416:title 391:</ 378:title 353:</ 340:class 315:</ 180:FRESS 87:JSTOR 73:books 685:2014 559:and 529:bold 525:HTML 505:and 475:HTML 454:> 451:body 445:> 435:> 425:> 407:href 401:< 397:> 387:> 369:href 363:< 359:> 349:> 334:< 330:> 324:< 321:> 311:> 305:< 302:> 299:body 296:< 277:HTML 260:and 222:REST 218:XTCE 212:and 202:JATS 191:HTML 59:news 507:TEI 275:In 245:W3C 233:XML 226:XML 214:EDI 210:UBL 206:TEI 195:W3C 184:XML 153:XSD 42:by 701:: 662:. 656:. 332:A 318:h1 308:h1 291:. 243:, 228:. 174:, 155:, 118:A 687:. 666:. 641:. 539:( 531:( 442:p 432:a 419:= 410:= 404:a 394:a 381:= 372:= 366:a 343:= 327:p 109:) 103:( 98:) 94:( 84:· 77:· 70:· 63:· 36:.

Index


verification
improve this article
adding citations to reliable sources
"Structured document"
news
newspapers
books
scholar
JSTOR
Learn how and when to remove this message
electronic document
markup
database schemas
XSD
Relax NG
Schematron
Standard Generalized Markup Language
Scribe
Augment
FRESS
XML
HTML
W3C
JATS
TEI
UBL
EDI
XTCE
REST

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.