A few years ago I have a client that required migrating and re hosting a data warehouse from the Oracle environment to the Netezza Appliance. The application had hundreds of tables and almost 2900 Source to Target Mappings. This would have taken 1000′s of hours of manual coding. For this client we implemented a reusable data driven architecture that relys on a metadata mart. The end result a greatly reduced TCO for generating the code required vs manual cosing. In addition the process leaves behind a metadata mart to report on for auditing and other Data Governance efforts. We generated over 100,000 lines of code and over 3000 load scripts.After implementation the application can be maintained via the tables and automatically regenerated, providing a pro active or self healing application architecture to respond to changes in incoming source files or target changes. Code Here is a version for Oracle Loader
Prittie Good Approach: The objective is to “tactically” enable the clients analytical and reporting tools to their data via the Netezza platform very fast, with minimal coding effort, but in a “strategic fashion”. The “Prittie” approach will load a client’s data into the Netezza platform as fast and inexpensive as possible while still providing a reusable “production level” set of code for implementation. The approach as defined is to minimize upfront source analyses and data profiling and leverage the “speed” of Netezza , while applying specific coding techniques developed by you to automatically generate a the SQL code necessary to import data into the Netezza environment from industry standard delimited files with headers and a mapping document. The application will build a Metadata Mart for support data driven code generation and as a tactical tool for Data Governance. The primary TCO driver or value add gained with using our approach is to significantly reduced time and cost for an initial load , data profiling and evaluation of data quality, and speed up the process of Business Analysis, as well as leveraging an iterative methodology. In addition business data profiling as well as source analysis can then be accomplished on the Netezza platform, very quickly, while capturing critical load “halting” problems as well as basic profiling metadata immediately and resolving them.
Target: Tables and Columns are derived from parsing machine generated DDL for existing tables, for table columns formats and column order.
Source: Filenames and Column are derived from parsing Header files extracted from industry standard “tab” delimited files with Headers, this provide columns contained in the files and there order.
Mapping: (Source to Target) are derived from parsing the “Bill of Lading” file containing cross reference of multiple filenames to target tables and a grouping column in this case. You original file also had external name this is no longer needed;
Prittie Approach Specific Logic: From these three inputs the application generates Netezza ANSI Standard SQL Code that create an external table with columns ordered by the incoming filename and associated with the DDL Column format, and the associated Netezza logic for external filename allocation and logging. A simple SQL statement is generated that checks for duplicates and data errors , that can be logged. In addition I have includde the ability to generate the Drop and Insert Logic to complete the “Prittie” Import and Load Architecture for Netezza.
Application Stability: The Source and Mapping files design are relatively stable. The Target is based on some of the idiosyncrasies found in the DDL created by the query tool used, with some additional effort this code can be enhanced to handle some “looseness”. The issue is in how the code (DDL) is generated and by what tool. The rest of the application is shielded from this by the creation of a “Source-Target-Mapping Cross Reference” table that is used in the actual code generation.
ETL Potential: As we have discussed the application could be extended easily to handle basic ETL. This could be accomplished by including one more that that contained transformations for specific table columns combinations or data types. An example would be to include a new mapping table that converted Oracle Datetime to ANSI as required. This is a simple example but the underlying design could handle relatively complex transformations. We could also move the Location and Logging Directory to a table to further make the application Metadata Driven.
Doug Prittie and Ira Warren Whiteside