Data virtualization enable organizations to increase analytics effectiveness and reduce analytics costs by creating a virtual layer that aggregates data from multiple sources. This enables companies to access data from multiple sources without setting up a costly data warehouse or spending time on data preparation. It is also called Logical data warehouses – LDW, data federation, virtual databases, and decentralized data warehouses.

A traditional data warehouse relies heavily on ETL that needs a significant programming effort with special tools and scripting languages. A logical data warehouse creates a virtual layer that handles the ETL.

What is data virtualization?

Data virtualization is a data management approach that enables data processing without dealing with the technical aspects of data storage. Wikipedia provides a more formal description:

Data virtualization is any approach to data management that allows an application to retrieve and manipulate data without requiring technical details about the data, such as how it is formatted at source, or where it is physically locate

To understand data virtualization, we need to understand what the traditional Data Warehouse (DW) means. Data warehouse is a commonly used data integration technique that is used to centralize data. We define it as:
Data warehousing is a technology that aggregates structured data from one or multiple sources in order to compare and analyze it for business intelligence. It is effective for getting a better understanding of the overall performance of a business because it makes a wide range of data available for analysis.
The Logical Data Warehouse (LDW) is the most common implementation of data virtualization. It is a term invented by Gartner in 2011. LDW differs from data warehouse because it is not monolithic. Its architecture, besides from core data warehouse of organization, includes external data sources such as enterprise systems, web and cloud data.
LDW connects multiple data sources and allows querying data via SQL to make data accessible.  In LDW, the data remains in place, real-time access is given to the source system for the data. Gartner Research Vice President Mark Beyer defines logical data warehouse:
Logical Data Warehouse is a new data management architecture for analytics combining the strengths of traditional repository warehouses with alternative data management and access strategy.

Why is it important now?

According to Gartner, data virtualization is the evolution and augmentation of data warehouse practices, not a replacement. These factors drive its growing importance:

  • Increasing complexity of businesses: Acquisitions and phases of fast growth leave businesses with multitude of physical databases that are not integrated. Data virtualization is the fastest approach to merge them for analytics
  • Increasing importance of analytics: Data virtualization enables faster analytics. As organizations’ interest in analytics and data driven decision making rises, the importance of a logical data warehouse gets more apparent since it enables a faster analytics process
  • Data hungry AI algorithms: While in the past analytics relied on the relationship between a few variables, modern deep learning algorithms are data hungry and can identify counter intuitive relationships in data. Therefore, aggregating critical data in a data warehouse is no longer sufficient for advanced analytics applications.
  • Increasing data volumes: Data generation rate is obviously increasing which makes it harder to keep a physical data warehouse up to date. Data virtualization is a more advanced approach to processing data from remote locations. Data virtualization enables some data processing to be completed at the remote data storage units, reducing data communication time.

How does it work?

Data virtualization combines two technologies to offer organizations flexible and scalable data. Two technologies are:

  • Data federation: Connecting multiple databases and showing them to the user as a single database. The technology provides flexibility to query data.
  • Analytical database management: Providing scalability to a logical data warehouse. Analytical databases are available as data warehouse appliances and no effort is required to relocate data for analysis in this virtual layer.
Source: Simplicity BI

When are the advantages of an LDW vs a traditional DW? 

The benefits of data virtualization (logical data warehouses) can be grouped into 2 categories. Here are the benefits:

Ease of use

  • A logical data warehouse is up to 90% faster to implement since less effort is required for its setup.
  • The source file’s format is not critical to access data for analysis
  • Each data is accessed via a range of services eg. SQAP, Odata, Sharepoint

Improved analytics effectiveness

  • Data virtualization minimizes data latency enabling real-time analysis from different data sources. Depending on data transfer speeds, data virtualization may not be able to offer real-time analysis but it definitely offers more up to date data than a physical data warehouse (DW) that can not be updated every minute in a cost-effective manner.
  • There is no data storage in a logical data warehouse. Data is at the source to access. Logical data warehouse features replace extract, transform and load (ETL) processes, data scientist can shift their focus on data query and analysis.

What are the challenges of data virtualization?

  • A sufficient amount of data sources (>10) is needed for a logical data warehouse to make more sense in terms of efficiency. Otherwise, the trade-off between cost and speed may not be worth it.
  • Logical DW does not provide a single source of truth like a traditional DW. Stability, availability, data consistency and correctness of a logical DW may challenge organizations.

What are the leading tools for LDW?

  • Actifio Sky
  • Atscale Virtual Data Warehouse
  • Data Virtuality
  • Denodo
  • IBM Cloud Park for Data
  • Informatica PowerCenter
  • Oracle Data Service Integrator
  • Red Hat jBoss
  • SAS Federation Service
  • Stone bond Technology
  • TIBCO

Informatica is the leading company in Logical DW market, and it generates $1.1B annually.

If you want to learn more about ETL processes in traditional DW, here are our articles for ETL and its tools.

If you still have questions, don’t hesitate to contact us:

Leave a Reply

Your email address will not be published. Required fields are marked *

*
*