|
|
How to Select an Extraction/Transformation/Load Tool #3
Selection Criteria for ETL Tools
Functions to look for in ETL tools include the following:
- General-purpose ETL tool suitable for all data warehousing applications
- Support for data extraction, cleansing, aggregation, reorganization, transformation, calculation, and load operations, including the following functions: Access data from multiple operational data sources: Access data from multiple operational data sources
Re-map source data into a common format
Standardize data to enable load to conformed target databases
Filter data, convert codes, perform table lookups, calculate derived values
Automated, slowly changing dimension support (Type I, Type II, Type III)
Incremental aggregation - computation of aggregates by the ETL tool in one pass of the source data
Support for Unicode - multi-byte character sets localized for Japanese and other languages
Support graphical job sequencer, re-usable containers, and nesting of sessions
Validate data to check content and range of field values
Perform procedural data cleansing functions
Support complete development environment, including versioning and run-time debugger
Load cleansed data to the target data mart or central DW
Produce audit and operational reports for each data load
Automatic generation of centralized metadata
Automatic generation of data extract programs
- Native interfaces to legacy files, relational databases, ERP sources (e.g., SAP R/3 and PeopleSoft), e-Business applications, Web log files, IBM MQ-Series, XML sources, etc.
- Support for near real-time clickstream data warehousing
- Support for an Enterprise e-Business environment, including integration at the metadata level with BI tools, ERP applications, CRM applications, analytic applications, corporate portals, etc.
- Platform independence and scalability to enterprise data warehousing applications
- Directly executable, in-memory, multi-threaded processing for fast, parallel operation
- No requirement to generate and compile source code
- No requirement for intermediate disc files
- Support for concurrent processing of multiple source data streams, without writing procedural code
- Specification of ETL functions using pre-packaged transformation objects, accessible via an intuitive graphical user interface
- Extensible transformation objects at a high level of significance
- Ability to specify complex transformations using only built-in transformation objects. The goal is to specify transformations without writing any procedural code
- Automatic generation of central metadata, including source data definitions, transformation objects, target data models, and operational statistics
- Metadata exchange architecture that supports automatic synchronization of central metadata with local metadata for multiple end-user BI tools
- Central management of distributed ETL engines and metadata using a central console and a global metadata repository
- End-user access to central metadata repository via a right-mouse click
- Metadata exchange API compliant with COM, UML, and XML
- Support of metadata standards, including OLE DB for OLAP
- Ability to schedule ETL sessions on time or the occurrence of a specified event, including support for command-line scheduling using external scheduling programs
- Ability to schedule FTP sessions on time or event
- Integration with data cleansing tools
- Import of complete data models from external data modeing tools
- Strong data warehouse administration functions
- Support for the analysis of transformations that failed to be accepted by the ETL process
- Extensive reporting of the results of an ETL session, including automatic notification of significant failures of the ETL process
|
 |