Knowledge (XXG)

Data stream management system

Source đź“ť

25: 353:
involved databases. But, since the data of a data streams is unknown in advance, there are no such statistics in a DSMS. However, it is possible to observe a data stream for a certain time to obtain some statistics. Using these statistics, the query can also be optimized later. So, in contrast to a DBMS, some DSMS allows to optimize the query even during runtime. Therefore, a DSMS needs some plan migration strategies to replace a running query plan with a new one.
379:
of the graph is connected to the outgoing sinks, which may be for example a visualization. Since most DSMSs are data-driven, a query is executed by pushing the incoming data elements from the source through the query plan to the sink. Each time when a data element passes an operator, the operator performs its specific operation on the data element and forwards the result to all successive operators.
217:
the continuous calculation of an average. Instead of memorizing each data point, the synopsis only holds the sum and the number of items. The average can be calculated by dividing the sum by the number. However, it should be mentioned that synopses cannot reflect the data accurately. Thus, a processing that is based on synopses may produce inaccurate results.
231:
windows. There are, for example, approaches that use timestamps or time intervals for system-wide windows or buffer-based windows for each single processing step. Sliding-window query processing is also suitable to being implemented in parallel processors by exploiting parallelism between different windows and/or within each window extent.
361:
Since a logical operator is only responsible for the semantics of an operation but does not consist of any algorithms, the logical query plan must be transformed into an executable counterpart. This is called a physical query plan. The distinction between a logical and a physical operator plan allows
352:
Furthermore, there are also cost-based optimization techniques like in DBMS, where a query plan with the lowest costs is chosen from different equivalent query plans. One example is to choose the order of two successive join operators. In DBMS this decision is mostly done by certain statistics of the
225:
Instead of using synopses to compress the characteristics of the whole data streams, window techniques only look on a portion of the data. This approach is motivated by the idea that only the most recent data are relevant. Therefore, a window continuously cuts out a part of the data stream, e.g. the
216:
The idea behind compression techniques is to maintain only a synopsis of the data, but not all (raw) data points of the data stream. The algorithms range from selecting random data points called sampling to summarization using histograms, wavelets or sketching. One simple example of a compression is
378:
Since the physical query plan consists of executable algorithms, it can be directly executed. For this, the physical query plan is installed into the system. The bottom of the graph (of the query plan) is connected to the incoming sources, which can be everything like connectors to sensors. The top
230:
lists or tumbling windows that cut out disjoint parts. Furthermore, the windows can also be differentiated into element-based windows, e.g., to consider the last ten elements, or time-based windows, e.g., to consider the last ten seconds of data. There are also different approaches to implementing
114:
One important feature of a DSMS is the possibility to handle potentially infinite and rapidly changing data streams by offering flexible processing at the same time, although there are only limited resources such as main memory. The following table provides various principles of DSMS and compares
101:
that is not only performed once, but is permanently installed. Therefore, the query is continuously executed until it is explicitly uninstalled. Since most DSMS are data-driven, a continuous query produces new results as long as new data arrive at the system. This basic concept is similar to
328:
In the next step, the declarative query is translated into a logical query plan. A query plan is a directed graph where the nodes are operators and the edges describe the processing flow. Each operator in the query plan encapsulates the semantic of a specific operation, such as filtering or
207:
One of the biggest challenges for a DSMS is to handle potentially infinite data streams using a fixed amount of memory and no random access to the data. There are different approaches to limit the amount of data in one pass, which can be divided into two classes. For the one hand, there are
349:, a query optimizer can use the algebraic equivalences to optimize the plan. These may be, for example, to push selection operators down to the sources, because they are not so computationally intensive like join operators. 243:
processing in DBMS by using declarative languages to express queries, which are translated into a plan of operators. These plans can be optimized and executed. A query processing often consists of the following steps.
226:
last ten data stream elements, and only considers these elements during the processing. There are different kinds of such windows like sliding windows that are similar to
362:
more than one implementation for the same logical operator. The join, for example, is logically the same, although it can be implemented by different algorithms like a
256:
in DBMS. Since there are no standardized query languages to express continuous queries, there are a lot of languages and variations. However, most of them are based on
333:, so that there are operators for selection, projection, join, and set operations. This operator concept allows the very flexible and versatile processing of a DSMS. 341:
The logical query plan can be optimized, which strongly depends on the streaming model. The basic concepts for optimizing continuous queries are equal to those from
97:. A DBMS also offers a flexible query processing so that the information needed can be expressed using queries. However, in contrast to a DBMS, a DSMS executes a 275:
The language strongly depends on the processing model. For example, if windows are used for the processing, the definition of a window has to be expressed. In
325:
This stream continuously calculates the average value of "price" of the last 10 tuples, but only considers those tuples whose prices are greater than 100.0.
208:
compression techniques that try to summarize the data and for the other hand there are window techniques that try to portion the data into (finite) parts.
451: 508: 370:. Notice, these algorithms also strongly depend on the used stream and processing model. Finally, the query is available as a physical query plan. 438: 471: 698: 679: 476: 272:. There are also graphical approaches where each processing step is a box and the processing flow is expressed by arrows between the boxes. 601: 46: 639: 68: 720: 227: 329:
aggregation. In DSMSs that process relational data streams, the operators are equal or similar to the operators of the
424: 39: 33: 90: 239:
Since there are a lot of prototypes, there is no standardized architecture. However, most DSMS are based on the
50: 741: 498: 261: 103: 503: 345:. If there are relational data streams and the logical query plan is based on relational operators from the 269: 640:"Chandrasekaran, S. et al, "TelegraphCQ: Continuous Dataflow Processing for an Uncertain World." CIDR 2003" 572: 530:"Parallel Patterns for Window-Based Stateful Operators on Data Streams: An Algorithmic Skeleton Approach" 240: 446: 421: 397: 435: 577: 456: 549: 346: 330: 736: 694: 675: 646: 541: 363: 442: 367: 342: 714: 593: 730: 486: 553: 481: 413:
NiagaraST: A Research Data Stream Management System at Portland State University
402: 86: 545: 417: 715:
Processing Flows of Information: From Data Stream to Complex Event Processing
461: 279:, a query with a sliding window for the last 10 elements looks like follows: 276: 265: 388: 252:
The formulation of queries is mostly done using declarative languages like
94: 623: 594:"NiagaraCQ: A Scalable Continuous Query System for Internet Databases" 407: 717:- Survey article on Data Stream and Complex Event Processing Systems 466: 412: 93:(DBMS), which is, however, designed for static data in conventional 529: 257: 253: 18: 430: 392: 592:
Jianjun Chen; David J. DeWitt; Feng Tian; Yuan Wang (2000).
85:(DSMS) is a computer software system to manage continuous 528:
De Matteis, Tiziano; Mencagli, Gabriele (25 March 2016).
723:- Introduction to streaming data management with SQL 106:
so that both technologies are partially coalescing.
626:
STREAM: The Stanford Data Stream Management System.
427:-based framework for Data Stream Management Systems 196:Variable data arrival and data characteristics 534:International Journal of Parallel Programming 8: 153:(Theoretically) unlimited secondary storage 628:Technical Report. 2004, Stanford InfoLab. 576: 69:Learn how and when to remove this message 509:Relational data stream management system 164:Consideration of the order of the input 117: 32:This article includes a list of general 569:Aurora: A Data Stream Management System 520: 172:Potentially extremely high update rate 693:. Waterloo, USA: Morgan and Claypool. 689:Golab, Lukasz; Ă–zsu, M. Tamer (2010). 124:Data stream management system (DSMS) 7: 672:Data Streams: Models and Algorithms 161:Only the current state is relevant 121:Database management system (DBMS) 38:it lacks sufficient corresponding 14: 248:Formulation of continuous queries 188:Assumes outdated/inaccurate data 599:. Computer Sciences Department. 23: 602:University of Wisconsin–Madison 203:Processing and streaming models 177:Little or no time requirements 1: 83:data stream management system 129:Persistent data (relations) 670:Aggarwal, Charu C. (2007). 457:SAS Event Stream Processing 193:Plannable query processing 169:Relatively low update rate 758: 721:Stream processing with SQL 447:webMethods Business Events 115:them to traditional DBMS. 91:database management system 546:10.1007/s10766-016-0413-x 357:Transformation of queries 262:Continuous Query Language 499:Complex Event Processing 441:24 December 2016 at the 393:StreamBase Systems, Inc. 281: 104:Complex event processing 16:Computer software system 504:Event stream processing 337:Optimization of queries 180:Real-time requirements 53:more precise citations. 691:Data Stream Management 674:. New York: Springer. 132:Volatile data streams 487:WSO2 Stream Processor 89:. It is similar to a 408:NIAGARA Query Engine 398:Hortonworks DataFlow 374:Execution of queries 156:Limited main memory 110:Functional principle 567:Abadi; et al. 185:Assumes exact data 148:Continuous queries 652:on 7 February 2014 624:Arasu, A., et al. 347:Relational algebra 331:Relational algebra 140:Sequential access 700:978-1-608-45272-9 681:978-0-387-47534-9 200: 199: 145:One-time queries 79: 78: 71: 749: 704: 685: 662: 661: 659: 657: 651: 645:. Archived from 644: 636: 630: 621: 615: 614: 612: 610: 598: 589: 583: 582: 580: 564: 558: 557: 525: 364:Nested loop join 343:database systems 321: 318: 315: 312: 309: 306: 303: 300: 297: 294: 291: 288: 285: 235:Query processing 118: 99:continuous query 74: 67: 63: 60: 54: 49:this article by 40:inline citations 27: 26: 19: 757: 756: 752: 751: 750: 748: 747: 746: 742:Data management 727: 726: 711: 701: 688: 682: 669: 666: 665: 655: 653: 649: 642: 638: 637: 633: 622: 618: 608: 606: 596: 591: 590: 586: 571:. SIGMOD 2003. 566: 565: 561: 527: 526: 522: 517: 495: 443:Wayback Machine 385: 376: 368:Sort-merge join 359: 339: 323: 322: 319: 316: 313: 310: 307: 304: 301: 298: 295: 292: 289: 286: 283: 250: 237: 223: 214: 205: 112: 75: 64: 58: 55: 45:Please help to 44: 28: 24: 17: 12: 11: 5: 755: 753: 745: 744: 739: 729: 728: 725: 724: 718: 710: 709:External links 707: 706: 705: 699: 686: 680: 664: 663: 631: 616: 584: 578:10.1.1.67.8671 559: 540:(2): 382–401. 519: 518: 516: 513: 512: 511: 506: 501: 494: 491: 490: 489: 484: 479: 474: 469: 464: 459: 454: 449: 433: 428: 415: 410: 405: 400: 395: 384: 381: 375: 372: 358: 355: 338: 335: 282: 260:, such as the 249: 246: 236: 233: 222: 219: 213: 210: 204: 201: 198: 197: 194: 190: 189: 186: 182: 181: 178: 174: 173: 170: 166: 165: 162: 158: 157: 154: 150: 149: 146: 142: 141: 138: 137:Random access 134: 133: 130: 126: 125: 122: 111: 108: 77: 76: 31: 29: 22: 15: 13: 10: 9: 6: 4: 3: 2: 754: 743: 740: 738: 735: 734: 732: 722: 719: 716: 713: 712: 708: 702: 696: 692: 687: 683: 677: 673: 668: 667: 648: 641: 635: 632: 629: 627: 620: 617: 604: 603: 595: 588: 585: 579: 574: 570: 563: 560: 555: 551: 547: 543: 539: 535: 531: 524: 521: 514: 510: 507: 505: 502: 500: 497: 496: 492: 488: 485: 483: 480: 478: 477:StreamInsight 475: 473: 470: 468: 465: 463: 460: 458: 455: 453: 450: 448: 444: 440: 437: 434: 432: 429: 426: 423: 419: 416: 414: 411: 409: 406: 404: 401: 399: 396: 394: 390: 387: 386: 382: 380: 373: 371: 369: 365: 356: 354: 350: 348: 344: 336: 334: 332: 326: 302:examplestream 280: 278: 273: 271: 267: 263: 259: 255: 247: 245: 242: 234: 232: 229: 220: 218: 211: 209: 202: 195: 192: 191: 187: 184: 183: 179: 176: 175: 171: 168: 167: 163: 160: 159: 155: 152: 151: 147: 144: 143: 139: 136: 135: 131: 128: 127: 123: 120: 119: 116: 109: 107: 105: 100: 96: 92: 88: 84: 73: 70: 62: 59:November 2018 52: 48: 42: 41: 35: 30: 21: 20: 690: 671: 654:. Retrieved 647:the original 634: 625: 619: 607:. Retrieved 600: 587: 568: 562: 537: 533: 523: 377: 360: 351: 340: 327: 324: 274: 251: 238: 224: 215: 206: 113: 98: 87:data streams 82: 80: 65: 56: 37: 609:21 November 482:TelegraphCQ 472:StreamGlobe 431:Pipeline DB 422:open source 403:IBM Streams 51:introducing 731:Categories 515:References 34:references 656:26 August 573:CiteSeerX 462:SQLstream 277:StreamSQL 266:StreamSQL 95:databases 737:Big data 605:. SIGMOD 493:See also 439:Archived 418:Odysseus 383:Examples 212:Synopses 452:QStream 264:(CQL), 221:Windows 47:improve 697:  678:  575:  554:255600 552:  467:STREAM 389:AURORA 284:SELECT 36:, but 650:(PDF) 643:(PDF) 597:(PDF) 550:S2CID 436:PIPES 420:, an 366:or a 308:value 305:WHERE 293:price 241:query 695:ISBN 676:ISBN 658:2011 611:2018 425:Java 311:> 299:FROM 268:and 228:FIFO 542:doi 314:100 287:AVG 270:ESP 258:SQL 254:SQL 733:: 548:. 538:45 536:. 532:. 445:, 391:, 81:A 703:. 684:. 660:. 613:. 581:. 556:. 544:: 320:0 317:. 296:) 290:( 72:) 66:( 61:) 57:( 43:.

Index

references
inline citations
improve
introducing
Learn how and when to remove this message
data streams
database management system
databases
Complex event processing
FIFO
query
SQL
SQL
Continuous Query Language
StreamSQL
ESP
StreamSQL
Relational algebra
database systems
Relational algebra
Nested loop join
Sort-merge join
AURORA
StreamBase Systems, Inc.
Hortonworks DataFlow
IBM Streams
NIAGARA Query Engine
NiagaraST: A Research Data Stream Management System at Portland State University
Odysseus
open source

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

↑