MMH Data Warehousing, Corporate Portal & e-business Applications.

Data Warehousing, Corporate Portal & e-Business Intelligence Applications

 

How to Select an Extraction/Transformation/Load Tool #3

Selection Criteria for ETL Tools

Functions to look for in ETL tools include the following:

  • General-purpose ETL tool suitable for all data warehousing applications
  • Support for data extraction, cleansing, aggregation, reorganization, transformation, calculation, and load operations, including the following functions:– Access data from multiple operational data sources:– Access data from multiple operational data sources

    – Re-map source data into a common format
    – Standardize data to enable load to conformed target databases
    – Filter data, convert codes, perform table lookups, calculate derived values
    – Automated, slowly changing dimension support (Type I, Type II, Type III)
    – Incremental aggregation - computation of aggregates by the ETL tool in one pass of the source data
    – Support for Unicode - multi-byte character sets localized for Japanese and other languages
    – Support graphical job sequencer, re-usable containers, and nesting of sessions
    – Validate data to check content and range of field values
    – Perform procedural data cleansing functions
    – Support complete development environment, including versioning and run-time debugger
    – Load cleansed data to the target data mart or central DW
    – Produce audit and operational reports for each data load
    – Automatic generation of centralized metadata
    – Automatic generation of data extract programs

  • Native interfaces to legacy files, relational databases, ERP sources (e.g., SAP R/3 and PeopleSoft), e-Business applications, Web log files, IBM MQ-Series, XML sources, etc.
  • Support for near real-time clickstream data warehousing
  • Support for an Enterprise e-Business environment, including integration at the metadata level with BI tools, ERP applications, CRM applications, analytic applications, corporate portals, etc.
  • Platform independence and scalability to enterprise data warehousing applications
  • Directly executable, in-memory, multi-threaded processing for fast, parallel operation
  • No requirement to generate and compile source code
  • No requirement for intermediate disc files
  • Support for concurrent processing of multiple source data streams, without writing procedural code
  • Specification of ETL functions using pre-packaged transformation objects, accessible via an intuitive graphical user interface
  • Extensible transformation objects at a high level of significance
  • Ability to specify complex transformations using only built-in transformation objects. The goal is to specify transformations without writing any procedural code
  • Automatic generation of central metadata, including source data definitions, transformation objects, target data models, and operational statistics
  • Metadata exchange architecture that supports automatic synchronization of central metadata with local metadata for multiple end-user BI tools
  • Central management of distributed ETL engines and metadata using a central console and a global metadata repository
  • End-user access to central metadata repository via a right-mouse click
  • Metadata exchange API compliant with COM, UML, and XML
  • Support of metadata standards, including OLE DB for OLAP
  • Ability to schedule ETL sessions on time or the occurrence of a specified event, including support for command-line scheduling using external scheduling programs
  • Ability to schedule FTP sessions on time or event
  • Integration with data cleansing tools
  • Import of complete data models from external data modeing tools
  • Strong data warehouse administration functions
  • Support for the analysis of transformations that failed to be accepted by the ETL process
  • Extensive reporting of the results of an ETL session, including automatic notification of significant failures of the ETL process

[ TOP ]