|
|
Mimno, Myers & Holum
"Rebuilding a Stalled Data Warehouse"
FlashPoint Column - July 10, 2001
"Rebuilding a Stalled Data Warehouse"
One of my consulting clients was struggling with a stalled data warehouse that was causing serious problems. The data warehouse was failing to meet the primary business requirement, which was to produce consolidated financial reports that were reconcilable and easy to modify. The data warehouse used an inefficient architecture consisting of hand-written data extraction routines and multiple, non-integrated reporting programs. There was no extraction/transformation/load (ETL) tool, no automated business-intelligence (BI) tool, and no meta data integration. High-level summary data could not be reconciled with lower-level transactional data, leading to serious data integrity and reporting problems. The application was slow, inflexible, unscalable, and difficult to modify to meet changing business requirements.
Under intense pressure to solve the business problem, the project manager had two choices: patch the existing data warehouse with band-aids and hope that it would succeed, or rebuild the data warehouse, utilizing the best technology and data warehousing practices available. He courageously chose the latter course.
The rebuilding effort centered on the use of an ETL tool as the heart of the new data warehousing architecture. The selected ETL tool incorporated native interfaces to extract data from all required data sources (including SAP R/3), transform the data, and load multiple target databases. It also automatically generated meta data that was used to integrate all components of the architecture with common business rules and technical definitions. The rebuilt data warehouse is successful and is being expanded to support additional business requirements.
Many organizations are facing the challenge of rebuilding stalled data warehouses. Organizations that built data warehouses in the past with old technology are now experiencing significant scalability, performance, and integration problems. Data warehouses stall and need to be rebuilt for many reasons, including failure to meet the needs of the business, lack of return on investment, obsolete technology, inappropriate or incorrect architecture, poor performance, inability to scale-up to large system requirements, inability to respond rapidly to change requests from users, lack of integration of components at the meta data level, and lack of flexibility in query, reporting, and analysis functions. Due to lack of meta data integration, many previously built data marts are "stovepipes" that do not share business rules and definitions. Non-integrated, stovepipe data marts cannot be used to support consolidated financial reporting.
These problems can be overcome through the use of new technology that was not available until recently. This technology includes powerful, second-generation ETL tools, new business intelligence tools, meta data integration techniques, incremental aggregation methods, hybrid OLAP technology, and growing integration of data warehousing with real-time systems.
New technology that you can use to rebuild stalled data warehousing applications includes powerful, second-generation ETL tools that can be utilized as the hub of a hub-and-spoke architecture. ETL tools automatically generate executable applications, saving the time and expense required to write and maintain procedural code. An important function of ETL tools is to automate the extraction, transformation, and load functions, while eliminating the need to write and maintain procedural code. ETL tools also support native interfaces to a wide range of source and target databases, including legacy systems, RDBMSs, ERP, CRM, Web logs, XML, and MQSeries message queues.
Leading ETL tools also support an extensible meta data exchange architecture that permits all components of the e-Business environment to be integrated at the meta data level. A meta data exchange architecture permits integration of ETL tools, BI tools, data modeling tools, ERP and CRM applications, analytic applications, corporate portals, and Web logs, using common business and technical definitions. To obtain maximum benefit from advanced ETL technology, organizations should strive to buy only components that integrate with a central meta data repository generated and maintained by an ETL tool.
Data quality issues should be addressed as part of the rebuilding process. Data cleansing tools are used to measure data quality and identify data with inconsistent, missing, incomplete, duplicative, or incorrect values. The same tools can be used to standardize, integrate, de-duplicate name and address fields, identify multiple occurrences of the same individual, and identify householding relationships. Look for data cleansing tools that integrate closely with ETL tools.
An important decision in the rebuilding process is selection of an appropriate development methodology. My experience in numerous consulting engagements is that data warehousing applications should be built bottom-up, not top-down. Bottom-up development focuses on incremental development of the data warehousing application, one business unit at a time. A major benefit of bottom-up development is that it minimizes development risk by focusing on small developmental steps and proving each step before moving on to the next step. Bottom-up development provides early, measurable return-on-investment by delivering a major increment of functionality every 90 days or less. In contrast, top-down development focuses on the specification of an enterprise model for the application and implementation of a central data warehouse. This often proves to be a lengthy, complex, risky, and expensive process that defers return-on-investment.
The use of new business-intelligence tools in the rebuilt data warehouse can provide business users with an intuitive, transparent interface to the data in the data warehouse. Make sure that the BI tool that you use is integrated at the meta data level with the ETL tool that you have selected. In practice, this means that the ETL tool should be selected first, followed by selection of a compatible BI tool. BI tools are moving rapidly to implement hybrid OLAP technology, including use of the new MDX multidimensional query language, integration of multidimensional and relational database technologies, and use of persistent, multidimensional caches to improve the performance of the BI tool.
Rebuilding a stalled data warehouse is a complex process that requires careful analysis of the causes of the failure and conscious understanding of how you can do better through application of advances in technology and utilization of best practices in data warehousing. There is no excuse for accepting the limitations of a stalled data warehouse.
For further information about the issues discussed in this report, please contact Pieter Mimno, Independent Consultant, at pmimno@mimno.com, or visit his Web site at www.mimno.com. Mr. Mimno specializes in the selection of system components and support for all phases of development for data warehousing, corporate portals, and eBusiness-intelligence applications
|
 |