Structure mining - Knowledge (XXG)

106:. are highly accurate with good and representative samples of the problem, but perform badly with biased data. Most of times better model presentation with more careful and unbiased representation of input and output is enough. A particularly relevant area where finding the appropriate structure and model is the key issue is 116:

is the standard mechanism used to refer to nodes and data items within XML. It has similarities to standard techniques for navigating directory hierarchies used in operating systems user interfaces. To data and structure mine XML data of any form, at least two extensions are required to conventional

89:

Such data presents large problems for conventional data mining. Two messages that conform to the same schema may have little data in common. Building a training set from such data means that if one were to try to format it as tabular data for conventional data mining, large sections of the tables

69:. Much of the world's interesting and mineable data does not easily fold into relational databases, though a generation of software engineers have been trained to believe this was the only way to handle data, and data mining algorithms have generally been developed only to cope with tabular data. 120:

As an example, if one were to represent a family tree in XML, using these extensions one could create a data set containing all the individuals node in the tree, data items such as name and age at death, and counts of related nodes, such as number of children. More sophisticated searches could

93:

There is a tacit assumption made in the design of most data mining algorithms that the data presented will be complete. The other necessity is that the actual mining algorithms employed, whether supervised or unsupervised, must be able to handle sparse data. Namely, machine learning algorithms

75:, being the most frequent way of representing semi-structured data, is able to represent both tabular data and arbitrary trees. Any particular representation of data to be exchanged between two applications in XML is normally described by a schema often written in 117:

data mining. These are the ability to associate an XPath statement with any data pattern and sub statements with each data node in the data pattern, and the ability to mine the presence and count of any node or set of nodes within the document.

83:, are normally very sophisticated, containing multiple optional subtrees, used for representing special case data. Frequently around 90% of a schema is concerned with the definition of these optional data items and sub-trees. 86:

Messages and data, therefore, that are transmitted or encoded using XML and that conform to the same schema are liable to contain very different data depending on what is being transmitted.

61:

has created new opportunities for data mining, which has traditionally been concerned with tabular data sets, reflecting the strong association between

209: 195: 177: 248: 94:

perform badly with incomplete data sets where only part of the information is supplied. For instance methods based on

124:

The addition of these data types related to the structure of a document or message facilitates structure mining.

95: 42: 332: 328: 257: 359: 143: 530: 489: 66: 58: 38: 429: 414: 342: 155: 241: 138: 419: 409: 273: 205: 191: 173: 369: 318: 303: 283: 268: 499: 434: 424: 394: 337: 308: 298: 46: 222:

The 5th International Workshop on Mining and Learning with Graphs, Firenze, Aug 1-3, 2007

509: 504: 469: 449: 444: 399: 374: 293: 166:

Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology

17: 524: 474: 464: 439: 313: 278: 234: 103: 484: 479: 459: 454: 384: 379: 354: 347: 323: 133: 99: 200:

F. Hadzic, H. Tan, T.S. Dillon, Mining of Data with Complex Structures, Springer,

404: 364: 107: 62: 494: 389: 288: 80: 113: 37:

is the process of finding and extracting useful information from

27:

Finding and extracting information from semi-structured data sets

230: 160:, Data mining UK conference, University of Nottingham, Aug 2003 76: 72: 221: 226: 79:. Practical examples of such schemata, for instance 121:extract data such as grandparents' lifespans etc. 242: 49:are special cases of structured data mining. 8: 249: 235: 227: 158:On data mining tree structured data in XML 7: 182:R.O. Duda, P.E. Hart, D.G. Stork, 25: 168:, Cambridge University Press, 1: 547: 264: 186:, John Wiley & Sons, 90:would or could be empty. 57:The growth of the use of 43:sequential pattern mining 184:Pattern Classification 35:structured data mining 18:Structured data mining 144:Inductive programming 430:Protection (privacy) 67:relational databases 59:semi-structured data 41:sets. Graph mining, 39:semi-structured data 156:Andrew N Edmonds, 139:Structured content 518: 517: 510:Wrangling/munging 360:Format management 210:978-3-642-17556-5 16:(Redirected from 538: 251: 244: 237: 228: 31:Structure mining 21: 546: 545: 541: 540: 539: 537: 536: 535: 521: 520: 519: 514: 490:Synchronization 260: 255: 218: 152: 130: 96:neural networks 55: 47:molecule mining 28: 23: 22: 15: 12: 11: 5: 544: 542: 534: 533: 523: 522: 516: 515: 513: 512: 507: 502: 497: 492: 487: 482: 477: 472: 467: 462: 457: 452: 447: 442: 437: 432: 427: 422: 417: 415:Pre-processing 412: 407: 402: 397: 392: 387: 382: 377: 372: 367: 362: 357: 352: 351: 350: 345: 340: 326: 321: 316: 311: 306: 301: 296: 291: 286: 281: 276: 271: 265: 262: 261: 256: 254: 253: 246: 239: 231: 225: 224: 217: 216:External links 214: 213: 212: 198: 180: 164:Gusfield, D., 162: 151: 148: 147: 146: 141: 136: 129: 126: 54: 51: 26: 24: 14: 13: 10: 9: 6: 4: 3: 2: 543: 532: 529: 528: 526: 511: 508: 506: 503: 501: 498: 496: 493: 491: 488: 486: 483: 481: 478: 476: 473: 471: 468: 466: 463: 461: 458: 456: 453: 451: 448: 446: 443: 441: 438: 436: 433: 431: 428: 426: 423: 421: 418: 416: 413: 411: 408: 406: 403: 401: 398: 396: 393: 391: 388: 386: 383: 381: 378: 376: 373: 371: 368: 366: 363: 361: 358: 356: 353: 349: 346: 344: 341: 339: 336: 335: 334: 330: 327: 325: 322: 320: 317: 315: 312: 310: 307: 305: 302: 300: 297: 295: 292: 290: 287: 285: 282: 280: 277: 275: 272: 270: 267: 266: 263: 259: 252: 247: 245: 240: 238: 233: 232: 229: 223: 220: 219: 215: 211: 207: 203: 199: 197: 196:0-471-05669-3 193: 189: 185: 181: 179: 178:0-521-58519-8 175: 171: 167: 163: 161: 159: 154: 153: 149: 145: 142: 140: 137: 135: 132: 131: 127: 125: 122: 118: 115: 111: 109: 105: 104:ID3 algorithm 101: 97: 91: 87: 84: 82: 78: 74: 70: 68: 64: 60: 52: 50: 48: 44: 40: 36: 32: 19: 420:Preservation 410:Philanthropy 274:Augmentation 201: 187: 183: 169: 165: 157: 134:Graph kernel 123: 119: 112: 100:Ross Quinlan 92: 88: 85: 71: 56: 34: 30: 29: 531:Data mining 480:Stewardship 370:Integration 319:Degradation 304:Compression 284:Archaeology 269:Acquisition 108:text mining 63:data mining 53:Description 500:Validation 435:Publishing 425:Processing 395:Management 309:Corruption 299:Collection 150:References 505:Warehouse 470:Scrubbing 450:Retention 445:Reduction 400:Migration 375:Integrity 343:Transform 294:Cleansing 525:Category 475:Security 465:Scraping 440:Recovery 314:Curation 279:Analysis 128:See also 485:Storage 460:Science 455:Quality 385:Lineage 380:Library 355:Farming 338:Extract 324:Editing 405:Mining 365:Fusion 208: 194: 176: 81:NewsML 114:XPath 98:. or 495:Type 390:Loss 348:Load 258:Data 206:ISBN 202:2010 192:ISBN 188:2001 174:ISBN 170:1997 65:and 45:and 333:ELT 329:ETL 289:Big 102:'s 77:XSD 73:XML 33:or 527:: 204:. 190:. 172:. 110:. 331:/ 250:e 243:t 236:v 20:)

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

Index