MMH Data Warehousing, Corporate Portal & e-business Applications.

Data Warehousing, Corporate Portal & e-Business Intelligence Applications

 

Mimno, Myers & Holum

COLUMN: HOW TO AVOID DATA MART CHAOS USING HYBRID METHODOLOGY

FlashPoint Column - June 7, 2002
COLUMN: HOW TO AVOID DATA MART CHAOS USING HYBRID METHODOLOGY
by Pieter Mimno
Independent Consultant

SUMMARY: Many data warehousing projects fail because they take too long and cost too much. This article describes how you can build successful data warehousing applications at low cost and low risk using hybrid development techniques, while avoiding "data mart chaos". The recommended approach avoids development of non-integrated data marts by incorporating a rigorous infrastructure that ensures synchronization of all physical target databases with an Entity-Relationship model, and synchronization of all business rules and data definitions with a central metadata repository.

I NEED A SOLUTION YESTERDAY!
In my consulting practice, I encounter business managers who have extremely painful problems to solve. The problems may be related to an inability to respond rapidly to business challenges, lack of understanding of customer needs, inability to deal with a chaotic jumble of non-integrated, stovepipe applications, or lack of a clean, consistent source of data for decision making. Whatever the problem is, the business manager often needs a solution to the problem yesterday. Typically, the longest time the manager is willing to wait for a solution is 90 days. Coincidentally, 90 days is also the longest time that most CFOs are willing to wait for an initial ROI on an investment in technology.
If data warehousing is identified as the appropriate technology to provide the business solution, the manager is faced with a difficult decision: how can data warehousing technology be used to provide a meaningful solution to the business problem and a measurable ROI in 90 days?

HYBRID DEVELOPMENT TECHNIQUES
I have been successful in meeting this challenge with a hybrid of classical top-down and bottom-up development techniques. In common with other approaches, the hybrid methodology builds the data warehousing application incrementally, one business area at a time. However, many of the techniques used in the hybrid methodology are quite different from those advocated by leading practitioners of top-down development. Characteristics of hybrid techniques include:

  • Global requirement definitions are defined through short interviews with individual business units, lasting one day or less per interview
  • Development of the initial enterprise data model is constrained to 2 to 3 weeks. As data marts are added, the data model for the data warehouse is expanded via incremental enterprise data modeling techniques
  • Development of the first operational data mart is completed within a 90-day timebox. Successive data marts may be developed every 60 to 90 days
  • Data marts are developed prior to the data warehouse. Unlike the traditional model, data marts are populated by an ETL tool, not the data warehouse. Similarly, aggregates are computed by an ETL tool, not the data warehouse, using incremental aggregation techniques
  • Data marts contain both atomic transactions that are relevant to the specific business area and summarized or aggregated data
  • Development of a central data warehouse is optional and may be deferred until it is required to reduce redundancy of atomic data, or to consolidate atomic data in one central database
  • Development of an Operational Data Store is optional and may be deferred
  • The data warehousing application is based on a "hub-and-spoke" architecture, but the ETL tool is the hub, not the data warehouse
  • All components of the architecture are integrated with metadata generated and synchronized automatically by an ETL tool
  • Data marts are dependent, but their dependency is based on deriving local metadata from central metadata definitions, not by deriving all of their data from a central data warehouse
  • If a staging area is used, it is a volatile flat file with no user access
Although the hybrid development effort is focused on incremental development of data marts, the data marts are embedded within a long-term enterprise data warehousing architecture that is specified in an early phase of the methodology. Components of the recommended Enterprise Data Warehousing architecture include multiple data sources, an ETL tool, a central metadata repository and metadata exchange architecture (both of which are components of the ETL tool), a data modeling tool, a central data warehouse, multiple data marts, and one or more BI tools.
A critical component of the architecture is the central metadata repository, encompassing both business rules and the technical characteristics of the data and data transformations, which are shared by the various subject area data marts. Shared metadata is the "glue" that holds together the other key components of the enterprise data warehousing architecture, such as ETL and end-user tools, and prevents the occurrence of "stovepipe" data marts that cannot be reconciled to each other or to source systems. The creation of reusable data transformation objects in a shared metadata environment is critical to enabling rapid responsiveness to change in user reporting and analysis needs.
The hybrid methodology represents an alternative to traditional data warehousing development techniques that have been in use for many years. In the hybrid approach, development of more complex components of the architecture, such as a central data warehouse and an ODS, are optional and may be deferred until later stages of the development effort. The incremental development effort is kept under control through use of logical data modeling techniques (E-R diagrams that gradually expand to an enterprise model), and integration of all components of the architecture with central metadata, generated and maintained by the ETL tool.
The bottom-up approach has the advantage that it requires little up-front investment and builds the application one step at a time, proving the success of each step before going on to the next step. The first deliverable of the bottom-up approach is a fully functional data mart for a specific subject area, not a proof-of-concept. Subsequent data marts are delivered every 60 to 90 days.
In contrast, top-down development methodologies, which are derived from Information Engineering techniques, focus on the data warehouse. Top-down development typically requires lengthy interviews with end users, detailed studies of functional requirements across multiple business areas, and time-consuming, up-front enterprise-level data modeling. Due to the extensive amount of up-front effort required to specify and develop significant portions of the enterprise data warehouse, top-down development often requires 6 to 8 months to produce tangible business benefits in the form of operational data marts. This is generally unacceptable to business managers who are looking for a much more rapid solution to their business problems.
Top-down development methodologies often recommend development of multiple, large, normalized databases, including a data warehouse, Operational Data Store, persistent Staging Area, Exploration Warehouse, Data Mining Warehouse, etc. Development of these large, normalized databases, all of which are optional in the hybrid approach, can be an extremely complex, expensive process.
Top-down advocates assert that bottom-up or hybrid development leads to "data mart chaos", i.e., a plethora of non-integrated, stovepipe data marts that do not operate within a solid data warehousing architecture. However, as discussed above, the hybrid approach is based on a rigorous infrastructure that ensures that all physical target databases are synchronized with a single logical Entity-Relationship model, and all business rules and data definitions throughout the system are synchronized with central metadata, generated and maintained by the ETL tool. Hybrid development techniques have demonstrated over many years that it is possible to build large, complex data warehousing applications, one business unit at a time, without causing data mart chaos.

Note: This series of articles describes steps in a hybrid methodology that I have found to be successful for the implementation of data warehousing applications. The primary goals of the methodology are to reduce the up-front effort required to specify the functionality of a data warehousing application and deliver data marts in 90 days or less, at low cost and low development risk. The overall methodology is summarized in previous FlashPoint articles by Pieter Mimno.
For further information about the issues discussed in this report, please contact Pieter Mimno, Independent Consultant, at pmimno@mimno.com, or visit his Web site at www.mimno.com. Mr. Mimno specializes in the selection of system components and support for all phases of development of data warehousing applications.

Reprinted with permission from The Data Warehousing Institute. 
Copyright 2002. The Data Warehousing Institute.

[ TOP ]