MMH Data Warehousing, Corporate Portal & e-business Applications.

Data Warehousing, Corporate Portal & e-Business Intelligence Applications

 

Mistakes to Avoid in Building Data Warehouses

Table of Contents:

Selection of the Wrong Architecture

Many data warehouses fail due to the selection of architecture that is incapable of meeting the business requirements. The wrong architectural decisions can lead to the total failure of the data warehouse.

Due to inexperience and a desire to build data warehouses quickly and cheaply, many organizations suffer repeated architecture failures. Architectures that are very likely to fail include the following:

  • "Virtual" data warehouse that provide direct, on-line access to operational data and that fail due to disruption of operational systems
  • "Data-Mart-in-a-Box" that fails due to lack of a coherent data extraction facility and lack of data quality
  • "Stovepipe" data marts that fail due to lack of integration across business areas

The recommendation presented in this paper is to design the data warehouse to avoid building virtual data warehouses, data-marts-in-a-box, and stovepipe data marts. As discussed below, successful data warehouses are typically based on an integrated architecture incorporating an extraction/transformation/load tool, a central data warehouse and multiple dependent data marts tightly integrated with a central metadata repository. A great deal of time and effort can be saved by building the data warehouse right the first time using a fully integrated architecture. Virtual data warehouses, data-marts-in-a-box, stovepipe data marts, and fully integrated data warehousing architectures are described in the next sections.

"Virtual” Data Warehouse - An Architecture to Avoid

An architectural option that is attractive to many business managers is to buy a simple data warehousing tool that provides end users with direct access to operational data on legacy databases. In this option, called a virtual data warehouse, data is not moved from source databases to a target data warehouse. As shown in Figure 1, there is no target database and no data warehouse. Business analysts use intuitive, end-user tools to access data directly from operational files, using common business terms.

This option is popular because it enables business analysts to access and analyze data from operational systems, such as Oracle Financials or SAP R/3, without having to rely on IT personnel to extract and process the data. Virtual data warehouses utilize existing legacy systems as database servers and require minimal investment in additional hardware and software.

The advantage of virtual data warehousing is its simplicity. There is no need for complex data extraction and transformation tools, metadata repositories, or physically separate data warehouse databases. The drill-through capabilities of conventional front-end data access and analysis tools are used to pull data from operational data sources and provide multidimensional views of the source data. The front-end data access and analysis tools simplify access to source data on legacy systems, and hide the complexity of the existing mainframe systems. The ability to access data directly from legacy systems with intuitive front-end data warehousing tools preserves the investment in mainframe hardware, software, databases, and skillsets.

The primary limitation of virtual data warehousing systems is that a true data warehouse is not created. There is no data warehouse database, no historical data, little summarization or aggregation of data, no central metadata, and no cleaning-up and reorganization of data to make it understandable by end users. Furthermore, since query functions access data directly from source files, on-line operations are likely to be disrupted. Query functions such as “Show me the inventory level, sell through, and turnover for my ten highest and lowest selling items” can result in a massive number of query functions that can severely impact the operational performance of host computers. In a short time, virtual data warehouse tools will make you extremely unpopular with on-line operations personnel, due to the interference of DSS queries with on-line operations.

Recommendation: Avoid building virtual data warehouses. They are certain to cause operational problems by interfering with on-line transaction processing systems. Although virtual data warehouses may save money initially, they will have to be re-implemented on a dedicated, physically separate data warehousing facility at some time in the future. Moving from a virtual data warehouse to a dedicated warehouse may be extremely disruptive if it occurs at a time when the data warehouse is required to support a large number of business end users. If an organization does build virtual data warehouses to support direct access to operational data, these “warehouses” should be considered temporary facilities, which will be replaced as soon as possible with a physically separate data warehousing system.

[ TOP ]