Knowledge (XXG)

Data virtualization

Source đź“ť

72:
when adapting to changing requirements, involving changes at multiple steps. Data virtualization, in contrast, allows users to simply describe the desired outcome. The software then automatically generates the necessary steps to achieve this result. If the desired outcome changes, updating the description suffices, and the software adjusts the intermediate steps accordingly. This flexibility can accelerate processes by up to five times, underscoring the primary advantage of data virtualization.
108:. Data virtualization can efficiently bridge data across data warehouses, data marts, and data lakes without having to create a whole new integrated physical data platform. Existing data infrastructure can continue performing their core functions while the data virtualization layer just leverages the data from those sources. This aspect of data virtualization makes it complementary to all existing data sources and increases the availability and usage of enterprise data. 84:
combine personal data from different sources without physically copying them to another location while also limiting the view to all other collected variables. However, virtualization does not eliminate the requirement to confirm the security and privacy of the analysis results before making them more widely available. Regardless of the chosen data integration method, all results based on personal level data should be protected with the appropriate privacy requirements.
116:
storage. However, data virtualization may be extended and adapted to serve data warehousing requirements also. This will require an understanding of the data storage and history requirements along with planning and design to incorporate the right type of data virtualization, integration, and storage strategies, and infrastructure/performance optimizations (e.g., streaming, in-memory, hybrid storage).
68:
platforms, lowering the risk of error caused by faulty data, and guaranteeing that the newest data is used. Furthermore, avoiding the creation of a new database containing personal information can make it easier to comply with privacy regulations. As a result, data virtualization creates new possibilities for data use.
80:
system, data can be imported into the lake by following specific procedures in a single environment. When using a virtualization system, the environment must separately establish secure connections with each data source, which is typically located in a different environment from the virtualization system itself.
31:("ETL") process, the data remains in place, and real-time access is given to the source system for the data. This reduces the risk of data errors, of the workload moving data around that may never be used, and it does not attempt to impose a single data model on the data (an example of heterogeneous data is a 1015:"Managing the Veritas provisioning file system (VPFS) configuration parameters | Managing NetBackup services from the deduplication shell | Accessing NetBackup WORM storage server instances for management tasks | Managing NetBackup application instances | NetBackup™ 10.2.0.1 Application Guide | Veritas™" 75:
However, with data virtualization, the connection to all necessary data sources must be operational as there is no local copy of the data, which is one of the main drawbacks of the approach. Connection problems occur more often in complex systems where one or more crucial sources will occasionally be
71:
Building on this, data virtualization's real value, particularly for users, is its declarative approach. Unlike traditional data integration methods that require specifying every step of integration, this approach can be less error-prone and more efficient. Traditional methods are tedious, especially
79:
Moreover, because data virtualization solutions may use large numbers of network connections to read the original data and server virtualised tables to other solutions over the network, system security requires more consideration than it does with traditional data lakes. In a conventional data lake
115:
and data warehousing but for performance considerations it's not really recommended for a very large data warehouse. Data virtualization is inherently aimed at producing quick and timely insights from multiple sources without having to embark on a major data project with extensive ETL and data
83:
Security of personal data and compliance with regulations can be a major issue when introducing new services or attempting to combine various data sources. When data is delivered for analysis, data virtualisation can help to resolve privacy-related problems. Virtualization makes it possible to
67:
The defining feature of data virtualization is that the data used remains in its original locations and real-time access is established to allow analytics across multiple sources. This aids in resolving some technical difficulties such as compatibility problems when combining data from various
142:
The storage-agnostic Primary Data (defunct, reincarnated as Hammerspace) was a data virtualization platform that enabled applications, servers, and clients to transparently access data while it was migrated between direct-attached, network-attached, private and public cloud
35:). The technology also supports the writing of transaction data updates back to the source systems. To resolve differences in source and consumer formats and semantics, various abstraction and transformation techniques are used. This concept and software is a subset of 19:
is an approach to data management that allows an application to retrieve and manipulate data without requiring technical details about the data, such as how it is formatted at source, or where it is physically located, and can provide a
668: 688: 76:
unavailable. Smart data buffering, such as keeping the data from the most recent few requests in the virtualization system buffer can help to mitigate this issue.
917: 139:'s data virtualization tool to enable its researchers to quickly combine data from both internal and external sources into a searchable virtual data store. 129:—implemented Denodo’s data virtualization technology between its Spanish subsidiary’s transactional systems and the Web-based systems of mobile operators. 557: 334:
are terms used by some vendors to describe a core element of data virtualization: the capability to create relational JOINs in a federated VIEW.
214:
Abstraction – Abstract the technical aspects of stored data, such as location, storage structure, API, access language, and storage technology.
440: â€“ type of meta-database management system which transparently maps multiple autonomous database systems into a single federated database 494: 295:
Change management "is a huge overhead, as any changes need to be accepted by all applications and users sharing the same virtualization kit"
965: 279:
May impact Operational systems response time, particularly if under-scaled to cope with unanticipated user queries or not tuned early on.
1103: 425: 327: 1141: 1122: 685: 752: 656: 157:) to provide a connection to a virtual database layer that is internally connected to a variety of back-end data sources using 217:
Virtualized Data Access – Connect to different data sources and make them accessible from a common logical data access point.
174: 44: 232:
Data delivery – Publish result sets as views and/or data services executed by client application or users when requested.
1014: 437: 331: 283: 32: 1160: 112: 28: 282:
Does not impose a heterogeneous data model, meaning the user has to interpret the data, unless combined with
644: 466: 243:
collects, stores and analyzes information about data and metadata (data about data) in use within a domain.
184: 1115:
Data Virtualization for Business Intelligence Systems: Revolutionizing Data Integration for Data Warehouses
431: 391: 105: 56: 262:
Most systems enable self-service creation of virtual databases by end users with access to source systems
40: 514:
Paiho, Satu; Tuominen, Pekka; Rökman, Jyri; Ylikerälä, Markus; Pajula, Juha; Siikavirta, Hanne (2022).
595: 21: 881: 236:
Data virtualization software may include functions for development, operation, and/or management.
220: 125:
The Phone House—the trading name for the European operations of UK-based mobile phone retail chain
537: 126: 1134:
Data Integration Blueprint and Modeling: Techniques for a Scalable and Sustainable Architecture
753:
https://www.actifio.com/company/blog/post/enterprise-data-service-new-copy-data-virtualization/
478: 1137: 1118: 1099: 449: â€“ Data processing system without interaction with other computer data processing systems 292:
Not suitable for recording the historic snapshots of data. A data warehouse is better for this
52: 1063: 527: 446: 419: 187:
may use a single ODBC-based DSN to provide a connection to a similar virtual database layer.
154: 150: 36: 1096:
Data Virtualization: Going Beyond Traditional Data Integration to Achieve Business Agility
692: 240: 226: 48: 309:
For federating or centralizing all data of the organization (Security and hacking issues)
289:
Requires a defined Governance approach to avoid budgeting issues with the shared services
620: 330:(EII) (first coined by Metamatrix), now known as Red Hat JBoss Data Virtualization, and 918:"Getting Started Guide Red Hat JBoss Data Virtualization 6.4 | Red Hat Customer Portal" 428: â€“ Support a unified view of data and information for an entire organization (EII) 93: 1154: 541: 306:
For accessing Operational Data Systems (Performance and Operational Integrity issues)
92:
Some enterprise landscapes are filled with disparate data sources including multiple
223:– Transform, improve quality, reformat, aggregate etc. source data for consumer use. 210:
Data Virtualization software provides some or all of the following capabilities:
104:, even though a Data Warehouse, if implemented correctly, should be unique and a 259:
Allows for query processing pushed down to data source instead of in middle tier
194: 146: 857: 966:"Stratio Business Semantic Data Layer delivers 99% answer accuracy for LLMs" 729: 621:"Metadata-Driven Design: Designing a Flexible Engine for API Data Retrieval" 101: 97: 1038: 403:
Veritas Provisioning File System / Data Virtualization Veritas Technologies
422: â€“ Combining data from different sources and providing a unified view 132: 410:
Another more up-to-date list with user rankings is compiled by Gartner.
201:. The system abstracts data from various file systems and object stores. 193:, an open-source virtual distributed file system (VDFS), started at the 1068: 942:"Stone Bond Technologies | Advanced Data Integration Platform Solution" 764: 532: 515: 385:
Stone Bond Technologies Enterprise Enabler Data Virtualization Platform
376: 271:
Accelerate processes up to five times through the declarative approach
190: 170: 495:"Data virtualisation on rise as ETL alternative for data integration" 198: 166: 669:|IT pros reveal benefits, drawbacks of data virtualization software" 705: 657:
Data virtualization: 6 best practices to help the business 'get it'
558:"The True Value of Data Virtualization: Beyond Marketing Buzzwords" 397: 312:
For building very large virtual Data warehouse (Performance issues)
382:
Enterprise Application Platform Data Virtualization (discontinued)
379: 136: 788: 882:
https://query.prod.cms.rt.microsoft.com/cms/api/am/binary/RWJFdq
789:"Data Virtuality - Integrate data for better-informed decisions" 178: 162: 158: 111:
Data virtualization may also be considered as an alternative to
645:"Rapid Access to Disparate Data Across Projects Without Rework" 298:
Designers should always keep performance considerations in mind
265:
Increase governance and reduce risk through the use of policies
706:"Analyticscreator - The Ultimate Toolbox for Data Enigneers" 833: 570: 893: 469:, Margaret Rouse, TechTarget.com, retrieved 19 August 2013 229:– Combine result sets from across multiple source systems. 24:(or single view of any other entity) of the overall data. 941: 812: 497:
Gareth Morgan, Computer Weekly, retrieved 19 August 2013
516:"Opportunities of collected city data for smart cities" 560:, Nick Golovin, medium.com, retrieved 14 November 2023 318:
If you have only one or two data sources to virtualize
253:
Reduce systems workload through not moving data around
315:
As an ETL process (Governance and performance issues)
256:
Increase speed of access to data on a real-time basis
990: 671:
Mark Brunelli, SearchDataManagement, 11 October 2012
442:
Pages displaying wikidata descriptions as a fallback
367:
Denodo Data Virtualization and Data Fabric Platform
342:Some data virtualization solutions and vendors: 594:Summan, Jesse; Handmaker, Leslie (2022-12-20). 434: â€“ Practice for controlling corporate data 355:Capsenta Ultrawrap, acquired by data.world 2019 834:"The industry leading data company for DevOps" 8: 695:Loraine Lawson, BusinessEdge, 7 October 2011 686:"The Pros and Cons of Data Virtualization" 681: 679: 677: 596:"Data Federation vs. Data Virtualization" 571:"Hammerspace - A True Global File System" 531: 490: 488: 486: 88:Data virtualization and data warehousing 858:"Denodo is a leader in data management" 509: 507: 505: 503: 459: 659:Joe McKendrick, ZDNet, 27 October 2011 553: 551: 286:and business understanding of the data 647:Informatica, retrieved 19 August 2013 7: 1094:Judith R. Davis; Robert Eve (2011). 765:"Ultrawrap - Semantic Web Standards" 364:Delphix Data Virtualization Platform 63:Applications, benefits and drawbacks 1064:"Best Data Virtualization Reviews" 426:Enterprise information integration 328:Enterprise information integration 195:University of California, Berkeley 14: 1039:"XAware Data Integration Project" 388:Stratio Generative AI Data Fabric 149:can use a single hyperlink-based 394:, part of JBoss Developer Studio 352:Actifio Copy Data Virtualization 467:"What is Data Virtualization?" 370:Microsoft Gluent Data Platform 1: 813:"My Blog – My WordPress Blog" 45:service-oriented architecture 268:Reduce data storage required 39:and is commonly used within 898:Querona Data Virtualization 1177: 1113:Rick van der Lans (2012). 479:Streamlining Customer Data 332:federated database systems 250:Reduce risk of data errors 1132:Anthony Giordano (2010). 730:"IBM Data Virtualization" 438:Federated database system 33:federated database system 710:www.analyticscreator.com 177:-style services, and/or 29:extract, transform, load 946:Stone Bond Technologies 349:IBM data Virtualization 185:Database virtualization 27:Unlike the traditional 1098:. Composite Software. 432:Master data management 106:single source of truth 57:master data management 41:business intelligence 22:single customer view 400:Data Virtualization 275:Drawbacks include: 221:Data transformation 17:Data virtualization 691:2014-08-05 at the 533:10.1049/smc2.12044 246:Benefits include: 127:Carphone Warehouse 922:access.redhat.com 53:enterprise search 1168: 1147: 1128: 1109: 1081: 1080: 1078: 1077: 1060: 1054: 1053: 1051: 1050: 1035: 1029: 1028: 1026: 1025: 1011: 1005: 1004: 1002: 1001: 987: 981: 980: 978: 977: 962: 956: 955: 953: 952: 938: 932: 931: 929: 928: 914: 908: 907: 905: 904: 890: 884: 879: 873: 872: 870: 869: 854: 848: 847: 845: 844: 830: 824: 823: 821: 820: 809: 803: 802: 800: 799: 785: 779: 778: 776: 775: 761: 755: 750: 744: 743: 741: 740: 726: 720: 719: 717: 716: 702: 696: 683: 672: 666: 660: 654: 648: 642: 636: 635: 633: 631: 619:Kendall, Aaron. 616: 610: 609: 607: 606: 591: 585: 584: 582: 581: 567: 561: 555: 546: 545: 535: 520:IET Smart Cities 511: 498: 492: 481: 476: 470: 464: 447:Disparate system 443: 420:Data integration 346:AnalyticsCreator 151:Data Source Name 37:data integration 1176: 1175: 1171: 1170: 1169: 1167: 1166: 1165: 1161:Data management 1151: 1150: 1144: 1131: 1125: 1112: 1106: 1093: 1090: 1088:Further reading 1085: 1084: 1075: 1073: 1062: 1061: 1057: 1048: 1046: 1037: 1036: 1032: 1023: 1021: 1019:www.veritas.com 1013: 1012: 1008: 999: 997: 989: 988: 984: 975: 973: 964: 963: 959: 950: 948: 940: 939: 935: 926: 924: 916: 915: 911: 902: 900: 892: 891: 887: 880: 876: 867: 865: 856: 855: 851: 842: 840: 832: 831: 827: 818: 816: 811: 810: 806: 797: 795: 793:Data Virtuality 787: 786: 782: 773: 771: 763: 762: 758: 751: 747: 738: 736: 728: 727: 723: 714: 712: 704: 703: 699: 693:Wayback Machine 684: 675: 667: 663: 655: 651: 643: 639: 629: 627: 618: 617: 613: 604: 602: 593: 592: 588: 579: 577: 569: 568: 564: 556: 549: 513: 512: 501: 493: 484: 477: 473: 465: 461: 456: 441: 416: 358:Data Virtuality 340: 325: 284:Data Federation 241:metadata engine 227:Data federation 208: 122: 94:data warehouses 90: 65: 49:cloud computing 47:data services, 12: 11: 5: 1174: 1172: 1164: 1163: 1153: 1152: 1149: 1148: 1142: 1129: 1123: 1110: 1105:978-0979930416 1104: 1089: 1086: 1083: 1082: 1055: 1030: 1006: 982: 957: 933: 909: 885: 874: 849: 825: 804: 780: 756: 745: 721: 697: 673: 661: 649: 637: 611: 586: 562: 547: 526:(4): 275–291. 499: 482: 471: 458: 457: 455: 452: 451: 450: 444: 435: 429: 423: 415: 412: 408: 407: 404: 401: 395: 389: 386: 383: 374: 371: 368: 365: 362: 359: 356: 353: 350: 347: 339: 336: 324: 321: 320: 319: 316: 313: 310: 307: 300: 299: 296: 293: 290: 287: 280: 273: 272: 269: 266: 263: 260: 257: 254: 251: 234: 233: 230: 224: 218: 215: 207: 204: 203: 202: 188: 182: 144: 140: 130: 121: 118: 89: 86: 64: 61: 13: 10: 9: 6: 4: 3: 2: 1173: 1162: 1159: 1158: 1156: 1145: 1143:9780137085309 1139: 1136:. IBM Press. 1135: 1130: 1126: 1124:9780123944252 1120: 1116: 1111: 1107: 1101: 1097: 1092: 1091: 1087: 1071: 1070: 1065: 1059: 1056: 1044: 1040: 1034: 1031: 1020: 1016: 1010: 1007: 996: 992: 986: 983: 971: 967: 961: 958: 947: 943: 937: 934: 923: 919: 913: 910: 899: 895: 889: 886: 883: 878: 875: 863: 859: 853: 850: 839: 835: 829: 826: 814: 808: 805: 794: 790: 784: 781: 770: 766: 760: 757: 754: 749: 746: 735: 731: 725: 722: 711: 707: 701: 698: 694: 690: 687: 682: 680: 678: 674: 670: 665: 662: 658: 653: 650: 646: 641: 638: 626: 622: 615: 612: 601: 597: 590: 587: 576: 572: 566: 563: 559: 554: 552: 548: 543: 539: 534: 529: 525: 521: 517: 510: 508: 506: 504: 500: 496: 491: 489: 487: 483: 480: 475: 472: 468: 463: 460: 453: 448: 445: 439: 436: 433: 430: 427: 424: 421: 418: 417: 413: 411: 405: 402: 399: 396: 393: 390: 387: 384: 381: 378: 375: 372: 369: 366: 363: 360: 357: 354: 351: 348: 345: 344: 343: 337: 335: 333: 329: 322: 317: 314: 311: 308: 305: 304: 303: 302:Avoid usage: 297: 294: 291: 288: 285: 281: 278: 277: 276: 270: 267: 264: 261: 258: 255: 252: 249: 248: 247: 244: 242: 237: 231: 228: 225: 222: 219: 216: 213: 212: 211: 206:Functionality 205: 200: 196: 192: 189: 186: 183: 180: 176: 172: 168: 164: 160: 156: 152: 148: 145: 141: 138: 134: 131: 128: 124: 123: 119: 117: 114: 109: 107: 103: 99: 95: 87: 85: 81: 77: 73: 69: 62: 60: 58: 54: 50: 46: 42: 38: 34: 30: 25: 23: 18: 1133: 1117:. Elsevier. 1114: 1095: 1074:. Retrieved 1067: 1058: 1047:. Retrieved 1045:. 2016-04-06 1042: 1033: 1022:. Retrieved 1018: 1009: 998:. Retrieved 994: 985: 974:. Retrieved 972:. 2024-01-15 969: 960: 949:. Retrieved 945: 936: 925:. Retrieved 921: 912: 901:. Retrieved 897: 888: 877: 866:. Retrieved 864:. 2014-09-03 861: 852: 841:. Retrieved 837: 828: 817:. Retrieved 815:. 2023-09-19 807: 796:. Retrieved 792: 783: 772:. Retrieved 768: 759: 748: 737:. Retrieved 733: 724: 713:. Retrieved 709: 700: 664: 652: 640: 628:. Retrieved 624: 614: 603:. Retrieved 599: 589: 578:. Retrieved 574: 565: 523: 519: 474: 462: 409: 341: 326: 301: 274: 245: 238: 235: 209: 135:implemented 110: 91: 82: 78: 74: 70: 66: 26: 16: 15: 1043:SourceForge 734:www.ibm.com 575:Hammerspace 147:Linked Data 1076:2024-02-07 1049:2024-04-09 1024:2024-04-09 1000:2024-04-09 976:2024-04-09 951:2024-04-09 927:2024-04-09 903:2024-04-09 868:2024-04-09 843:2024-04-09 819:2024-04-09 798:2024-04-09 774:2024-04-09 769:www.w3.org 739:2024-04-09 715:2024-08-27 605:2024-02-08 600:StreamSets 580:2021-10-31 454:References 338:Technology 102:data lakes 98:data marts 542:253467923 361:DataWerks 181:patterns. 100:, and/or 1155:Category 995:teiid.io 689:Archived 630:25 April 414:See also 143:storage. 133:Novartis 120:Examples 1069:Gartner 991:"Teiid" 970:Stratio 838:Delphix 377:Red Hat 373:Querona 323:History 191:Alluxio 171:ADO.NET 1140:  1121:  1102:  1072:. 2024 894:"Home" 862:Denodo 540:  406:XAware 199:AMPLab 167:OLE DB 55:, and 625:InfoQ 538:S2CID 398:TIBCO 392:Teeid 380:JBoss 137:TIBCO 1138:ISBN 1119:ISBN 1100:ISBN 632:2017 179:REST 163:JDBC 159:ODBC 528:doi 197:'s 175:SOA 155:DSN 113:ETL 1157:: 1066:. 1041:. 1017:. 993:. 968:. 944:. 920:. 896:. 860:. 836:. 791:. 767:. 732:. 708:. 676:^ 623:. 598:. 573:. 550:^ 536:. 522:. 518:. 502:^ 485:^ 239:A 173:, 169:, 165:, 161:, 96:, 59:. 51:, 43:, 1146:. 1127:. 1108:. 1079:. 1052:. 1027:. 1003:. 979:. 954:. 930:. 906:. 871:. 846:. 822:. 801:. 777:. 742:. 718:. 634:. 608:. 583:. 544:. 530:: 524:4 153:(

Index

single customer view
extract, transform, load
federated database system
data integration
business intelligence
service-oriented architecture
cloud computing
enterprise search
master data management
data warehouses
data marts
data lakes
single source of truth
ETL
Carphone Warehouse
Novartis
TIBCO
Linked Data
Data Source Name
DSN
ODBC
JDBC
OLE DB
ADO.NET
SOA
REST
Database virtualization
Alluxio
University of California, Berkeley
AMPLab

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

↑