![]() |
Data Warehousing, Corporate Portal & e-Business Intelligence Applications |
|||||||||||||||||||||
![]() |
![]() |
![]() |
||||||||||||||||||||
|
Mistakes to Avoid in Building Data Warehouses #4Table of Contents:
Architected Data MartsThe solution to this problem is to build architected data marts that utilize a metadata exchange architecture to ensure that the local metadata maintained by each data mart conforms with definitions maintained in a central metadata repository. Figure 5 illustrates the configuration of an architected data mart. An extraction/transformation/load (ETL) tools, such as PowerMart from Informatica Corporation, or DataStage from Ardent Software, is used to generate and maintain a central metadata repository. The central metadata repository provides a single version of the truth that can be used to define enterprisewide source data definitions, data models for target databases, and transformation rules that convert source data into target data. The central metadata repository maintained by the ETL tool is the heart of the data warehouse. It is an essential component of the architecture used to integrate all components of the data warehouse. To avoid building stovepipe data marts, IS must enforce a rule that requires developers of data marts to utilize tools approved by IS that have demonstrated the ability to integrate with the central metadata repository maintained by an ETL tool. Ideally, tools used in each data mart should dynamically read metadata from the central metadata repository to define local data definitions and models. The central metadata repository provides a single source of data definitions and models that can be used to ensure that all components of the architecture operate together within a single, coherent, integrated design. With the exception of a small number of end-user tools, it is currently difficult to integrate data marts with a central metadata repository. Today, vendors of ETL tools generate proprietary metadata that is incompatible with metadata generated by other ETL tool vendors. The lack of metadata standards makes it difficult to select best-of-breed tools from multiple vendors and integrate these tools within a common architecture. The OLAP Council tried, but failed, to obtain industry-wide agreement on the definition of a metadata standard. Recently, in another attempt to define a metadata standard, it published an Oracle-led, read-only metadata specification, called MDAPI 2.0, which is packaged as Java libraries and COM objects. The MDAPI 2.0 supports heterogeneous computing environments, including Windows, Unix, OS/2, Macintosh, and Java. Microsoft introduced an alternative metadata standard based on its Open Information Model (OIM) and an open API, called OLE DB for OLAP, for the exchange of metadata. OIM utilizes a set of layered COM-based components that define a metamodel for storing and managing objects in a repository. A vendor-neutral information model describing the structure and semantics of metadata is badly needed. In December, 1998, Microsoft turned over ownership and administration of the OIM to the Meta Data Coalition, which will integrate OIM with its existing Meta Data Interchange Specification (MDIS). Standards defined by Microsoft and the Meta Data Coalition are likely to be adopted by the entire industry. Several data warehousing vendors have already released versions of their products that are compliant with OIM and OLE DB for OLAP. In addition to compliance with open, published metadata standards, these vendors are likely to support their own proprietary metadata standard. If metadata standards are widely adopted, it will become much easier to share metadata among multiple data warehousing tools. This will alleviate the stovepipe data mart problem by making it possible to integrate data warehousing components from multiple vendors with a central metadata repository generated and maintained by an ETL tool. Architected Data Mart as Your First Pilot ProjectLet’s assume that your organization has decided to build a data warehouse for multiple business units. You have designed a long-term enterprise architecture for the data warehouse on paper based on an analysis of the projected needs of multiple business units for data warehousing facilities. The enterprise architecture incorporates a large central warehoused used to store detailed, transaction-level data, and multiple data marts used to store summarized and aggregated data for individual business units. You have decided to build the data warehouse using a bottom-up development methodology that focuses on the development of data marts one business area at a time. This is an appropriate methodology for organizations that are under time pressure to implement multiple data marts within an enterprise data warehousing strategy. As part of the methodology, you have identified the business drivers for the data warehouse, surveyed user needs, designed the long-term, enterprise data warehousing architecture on paper, defined the functional requirements for the first business area, and selected the primary components of your data warehouse. You are now ready to build your first data mart. The initial data mart will be implemented as a proof-of-concept pilot project for the selected business area. It is important that your pilot project is not a toy. It must demonstrate the viability of the data warehousing architecture and components you have selected. The pilot project must also demonstrate that you have a solution to three of the most common mistakes made in the development of data warehouses:
The recommended approach to avoid all three of these problem areas is to implement the pilot project as an architected data mart. An architected data mart consists of the following components:
Although the architecture is not complex, it can be used to build a realistic data mart that demonstrates solutions to the “virtual” data warehouse problem, “dirty data” problem, and “stovepipe” data mart problem. In spite of the simplification of the data mart architecture, it is a good starting point for the development of the data warehouse. Using this simplified architecture, a pilot project for an initial data mart can be developed rapidly, at low risk, with relatively little expenditure of resources. The initial data mart is implemented as a proof-of-concept pilot project for one business area. It is the first actual data mart built using the specified data warehousing components. The architecture of the initial data mart is an exact subset of the enterprise data warehousing architecture that has been defined previously on paper. The architecture includes multiple data sources, a data extraction/transformation/load (ETL) tool, a metadata exchange architecture, a target database, and a simple desktop OLAP tool for end users. Rather than postponing the use of an ETL tool to a later phase, the architecture of the proof-of-concept pilot project includes an ETL tool. The ETL tool is incorporated in the pilot project to determine as early as possible whether the selected tool can support the specific functional requirements of the application. The function of the pilot project is to test all decisions made in the project planning phase, including the specified hardware/software architecture, platforms, operating environments, data modeling tool, data extraction and transformation tool, metadata exchange architecture, databases, data access and analysis tools, middleware, development tools, development methodology, and skillsets. The goal of the pilot project is to build the initial data mart in 90 to 120 days, at low cost, using a minimal number of tools. Although it has a simple architecture, the initial data mart is not a toy; it is a fully functional data mart that can be used to support the decision-making requirements of analysts for the initial business area. Business analysts have the ability to use the full power of a data access and analysis tool to query data in business terms, produce a wide variety of reports and graphics, analyze data in multiple dimensions, drill-down into the target database to access more detailed data, and drill-through to the detail data to access atomic transactions. The pilot project delivers a fully integrated, architected data mart, not a stovepipe data mart. All components of the initial data mart are constrained to integrate with central meta data. Subsequent data marts must plug into the common framework defined and tested in the pilot project. To ensure that all components share common business rules, semantics, and definitions, the IT organization must enforce the use of a common architectural standard for the development of all data marts. At the conclusion of the pilot project, all components of the hardware/software architecture, tools, and development methodology are reviewed to determine what worked and what did not work in a realistic business environment. Recommendation: Build your first pilot project using the architecture shown in Figure 5. The architecture includes multiple data sources, an off-the-shelf ETL tool, a central metadata repository, a metadata exchange architecture, a single target database, and a desktop OLAP end-user tool.
|