Knowledge (XXG)

Semi-structured data

Source 📝

126:
that reflect a particular problem domain - in Word's case, formatting at the character and paragraph and document level, definitions of styles, inclusion of citations, etc. - which are nested within each other in complex ways. Understanding even a portion of such an XML document by reading it, let alone catching errors in its structure, is impossible without a very deep prior understanding of the specific XML implementation, along with assistance by software that understands the XML schema that has been employed. Such text is not "human-understandable" any more than a book written in Swahili (which uses the Latin alphabet) would be to an American or Western European who does not know a word of that language: the tags are symbols that are meaningless to a person unfamiliar with the domain.
274: 166: 125:
The concept of XML as "human-readable", however, can only be taken so far. Some implementations/dialects of XML, such as the XML representation of the contents of a Microsoft Word document, as implemented in Office 2007 and later versions, utilize dozens or even hundreds of different kinds of tags
110:
Some types of data described here as "semi-structured", especially XML, suffer from the impression that they are incapable of structural rigor at the same functional level as Relational Tables and Rows. Indeed, the view of XML as inherently semi-structured (previously, it was referred to as
381:. Typically the records in a semi-structured database are stored with unique IDs that are referenced with pointers to their location on disk. This makes navigational or path-based queries quite efficient, but for doing searches over many records (as is typical in 111:"unstructured") has handicapped its use for a widening range of data-centric applications. Even documents, normally thought of as the epitome of semi-structure, can be designed with virtually the same rigor as 122:
In view of this fact, XML might be referred to as having "flexible structure" capable of human-centric flow and hierarchy as well as highly rigorous element structure and data typing.
137:
or JavaScript Object Notation, is an open standard format that uses human-readable text to transmit data objects. JSON has been popularized by web services developed utilizing
103:(Object Exchange Model) was created prior to XML as a means of self-describing a data structure. XML has been popularized by web services that are developed utilizing 259:
Prone to "garbage in, garbage out"; by removing restraints from the data model, there is less forethought that is necessary to operate a data application.
295: 187: 38:
or other markers to separate semantic elements and enforce hierarchies of records and fields within the data. Therefore, it is also known as
233: 491: 321: 213: 299: 191: 242:
Support for lists of objects simplifies data models by avoiding messy translations of lists into a relational data model.
239:
Support for nested or hierarchical data often simplifies data models representing complex relationships between entities.
96: 284: 176: 303: 288: 195: 180: 119:
and processed by both commercial and custom software programs without reducing their usability by human readers.
507: 405: 389: 100: 68: 46: 27: 377:
is that queries cannot be made as efficiently as in a more constrained structure, such as in the
232:
Programmers persisting objects from their application to a database do not need to worry about
415: 152:
store data natively in JSON format, leveraging the pros of semi-structured data architecture.
378: 343: 31: 357:
It can represent the information of some data sources that cannot be constrained by schema.
440: 420: 347: 112: 72: 39: 23: 374: 339: 35: 501: 385:), it is not as efficient because it has to seek around the disk following pointers. 363:
It can be helpful to view structured data as semi-structured (for browsing purposes).
360:
It provides a flexible format for data exchange between different types of databases.
67:
are not the only forms of data anymore, and different applications need a medium for
45:
In semi-structured data, the entities belonging to the same class may have different
471: 252:
The traditional relational data model has a popular and ready-made query language,
49:
even though they are grouped together, and the attributes' order is not important.
273: 165: 485: 460: 116: 149: 64: 60: 57: 53: 52:
Semi-structured data are increasingly occurring since the advent of the
26:
that does not obey the tabular structure of data models associated with
145: 392:(OEM) is one standard to express semi-structured data, another way is 410: 92: 461:
The Penn database group has semi-structured and XML data project
138: 134: 104: 492:
Semi-Structured data analytics: Relational or Hadoop platform?
393: 382: 267: 253: 159: 88: 236:, but can often serialize objects via a light-weight library. 373:
The primary trade-off being made in using a semi-structured
350:, and the amount of structure used depends on the purpose. 353:The advantages of this model are the following: 488: – semi-structured data and XML 8: 448:Symposium on Principles of Database Systems 302:. Unsourced material may be challenged and 194:. Unsourced material may be challenged and 369:The data transfer format may be portable. 342:where there is no separation between the 322:Learn how and when to remove this message 214:Learn how and when to remove this message 75:, one often finds semi-structured data. 431: 99:are all forms of semi-structured data. 16:Data organized by tags but not tables 7: 300:adding citations to reliable sources 234:object-relational impedance mismatch 192:adding citations to reliable sources 14: 366:The schema can easily be changed. 272: 164: 472:Stanford Universities Lore DBMS 1: 34:, but nonetheless contains 524: 91:, other markup languages, 73:object-oriented databases 439:Peter Buneman (1997). 69:exchanging information 441:"Semistructured data" 406:Semi-structured model 390:Object Exchange Model 336:semi-structured model 264:Semi-structured model 486:UPenn Database Group 296:improve this section 188:improve this section 28:relational databases 20:Semi-structured data 144:Databases such as 115:, enforced by the 30:or other forms of 416:Unstructured data 332: 331: 324: 224: 223: 216: 515: 474: 469: 463: 458: 452: 451: 445: 436: 379:relational model 327: 320: 316: 313: 307: 276: 268: 219: 212: 208: 205: 199: 168: 160: 523: 522: 518: 517: 516: 514: 513: 512: 498: 497: 482: 477: 470: 466: 459: 455: 443: 438: 437: 433: 429: 421:Structured data 402: 328: 317: 311: 308: 293: 277: 266: 249: 229: 220: 209: 203: 200: 185: 169: 158: 132: 113:database schema 86: 81: 40:self-describing 24:structured data 17: 12: 11: 5: 521: 519: 511: 510: 500: 499: 496: 495: 489: 481: 480:External links 478: 476: 475: 464: 453: 430: 428: 425: 424: 423: 418: 413: 408: 401: 398: 375:database model 371: 370: 367: 364: 361: 358: 340:database model 330: 329: 280: 278: 271: 265: 262: 261: 260: 257: 248: 245: 244: 243: 240: 237: 228: 225: 222: 221: 172: 170: 163: 157: 154: 131: 128: 85: 82: 80: 77: 15: 13: 10: 9: 6: 4: 3: 2: 520: 509: 508:Data modeling 506: 505: 503: 493: 490: 487: 484: 483: 479: 473: 468: 465: 462: 457: 454: 449: 442: 435: 432: 426: 422: 419: 417: 414: 412: 409: 407: 404: 403: 399: 397: 395: 391: 386: 384: 380: 376: 368: 365: 362: 359: 356: 355: 354: 351: 349: 345: 341: 337: 326: 323: 315: 305: 301: 297: 291: 290: 286: 281:This section 279: 275: 270: 269: 263: 258: 255: 251: 250: 247:Disadvantages 246: 241: 238: 235: 231: 230: 226: 218: 215: 207: 197: 193: 189: 183: 182: 178: 173:This section 171: 167: 162: 161: 156:Pros and cons 155: 153: 151: 147: 142: 140: 136: 129: 127: 123: 120: 118: 114: 108: 106: 102: 98: 94: 90: 83: 78: 76: 74: 70: 66: 62: 59: 55: 50: 48: 43: 41: 37: 33: 29: 25: 22:is a form of 21: 467: 456: 447: 434: 387: 372: 352: 335: 333: 318: 309: 294:Please help 282: 210: 201: 186:Please help 174: 143: 141:principles. 133: 124: 121: 109: 107:principles. 87: 51: 44: 19: 18: 42:structure. 32:data tables 427:References 227:Advantages 117:XML schema 47:attributes 312:June 2024 283:does not 204:June 2024 175:does not 150:Couchbase 65:databases 61:documents 58:full-text 502:Category 400:See also 346:and the 54:Internet 304:removed 289:sources 196:removed 181:sources 146:MongoDB 494:by IBM 348:schema 95:, and 56:where 444:(PDF) 411:NoSQL 338:is a 93:email 79:Types 71:. In 388:The 344:data 334:The 287:any 285:cite 179:any 177:cite 148:and 139:REST 135:JSON 130:JSON 105:SOAP 63:and 36:tags 394:XML 383:SQL 298:by 254:SQL 190:by 101:OEM 97:EDI 89:XML 84:XML 504:: 446:. 396:. 450:. 325:) 319:( 314:) 310:( 306:. 292:. 256:. 217:) 211:( 206:) 202:( 198:. 184:.

Index

structured data
relational databases
data tables
tags
self-describing
attributes
Internet
full-text
documents
databases
exchanging information
object-oriented databases
XML
email
EDI
OEM
SOAP
database schema
XML schema
JSON
REST
MongoDB
Couchbase

cite
sources
improve this section
adding citations to reliable sources
removed
Learn how and when to remove this message

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.