Guerrilla MDS MDM The Road To Data Governance

Watch “Guerrilla MDS MDM The Road To Data Governance” by @irawhiteside on @PASSBIVC channel here: youtu.be/U0TtQUhch-U #SQLServer #SQLPASS

For tomorrow’s MDM you must be ready to embrace Agile Iterative and/or Extreme Scoping in order to fully realize the benefits and learn the constraints of MDS

Posted in Data Governance, data profiling, Dimensional Model, Extreme Scoping Agile SCRUM, Fuzzy Matching, Guerilla MDM, Guerilla MDS, Mdm, MDS, MDS Stored Procedures, SQL Saturday | Leave a comment

Creating a Metadata Mart via TSQL – Complete Data Profiling Kit – Download

 

 

With Data Profiling can apply the age old management adage “You get what you inspect, not what you expect” Readers Digest.

This article will describe how to implement a data profiling dashboard in Excel and a metadata repository as well as the TSQL required to load and update the repository.

Future articles will explore the data model as well as column and table relationship analysis using the Domain profiling results.

Data Profiling is essential  to properly determine inconsistencies  as well data transformation requirements  for integration efforts.

It is also important to be able to communicate the general data quality for the datasets or tables you will be processing.

With the assistance over the years of a few friends(Joe Novella, Scott Morgan and Michael Capes) as well as the work of Stephan DeBlois,  I have created a set of TSQL  Scripts that will create a set of tables that will provide the statistics to present your clients with a Data Quality Scorecard comparable to existing vendor tools such as Informatica , Data Flux , Data Stage and the SSIS Data Profiling Task.

This article contains a  Complete Data Profiling Kit – Download (Code, Excel Dashboards and Samples) providing capabilities similar to leading vendor tools, such as Informatica, Data Stage, Dataflux etc…

The primary difference is the repository is open and the code is also open and available and customizable. The profiling process has 4 steps as follows:

  1. Create Table Statistics – Total count of records.
  2. Create Column Statistics – Specific set of statistics for each column(i.e… minimum value, maximum value , distinct count, mode pattern, blank count, null count etc…)
  3. Create Column Domain Statistics – domain count(count of unique vales),domain pattern(SSN=999-99-9999,ZIP =  99999-9999)

Here is a sample of the Column Statistics Dashboard: There are three panels show one worksheet for a sample “Customers” file.

Complete Dashboard:

ProfilngDashboard1Panoramic

Column Profiling Dashboard 1-3:

ProfilngDashboard

 Column Profiling Dashboard 2-3:

ProfilngDashboard1

Column Profiling Dashboard 3-3:

ProfilngDashboard2

Domain Analysis:

 

In the example below you see an Excel  Worksheet that contains a pivot table allowing you to examine a columns patterns , in this cse Zip code, and subsequently drill into the actual values related to one of the patterns. Notice the Zip code example, we will review the pattern “9999”, or Zip code with only 4 numeric digits. When you click on the pattern og “9999” the actual value is revealed is

 Domain Analysis for ZipCode

ProfilingDomainAnalysis

Domain Analysis for Phone1

 

ProfilingDomainAnalysis2

 

Running the Profiling Scripts Manually

Perquisites:

The scripts support two databases. One is the MetadataMart for storing the profiling results, the other is the source for your profiling.

There are four scripts , simple run them in the following order:

  1. 0_Create Profilng Objects – Create all the Data Profiling Tables and Views

  2. 1_Load_TableStat – This script will load records into the TableStat profiling table

  3. 2_Load ColumnStat – This script will load records into the ColumnStat profiling table.  Specify Database, Schema and Table name filters as needed. Example, to profile every table names starting with “Dim” then change the table filter to SET @TABLE_FILTER = ‘Dim%. Specify the Database where the Data Profiling tables reside DECLARE @MetadataDB VARCHAR(256) SET @MetadataDB = ‘ODS_METADATA’ Specify Database, Schema and Table name filters as needed. Example, to profile every table names starting with “Dim” then change the table filter to SET @TABLE_FILTER = ‘Dim%

    SET     @DATABASE_FILTER = ‘CustomerDB’

    SET  @SCHEMA_FILTER = ‘dbo’

    SET  @TABLE_FILTER = ‘%Customer%’

    SET  @COLUMN_FILTER = ‘%’

  4. 3_load DomainStat – This script will load records into the DomainStat profiling table. Specify the Database where the Data Profiling tables reside DECLARE @MetadataDB VARCHAR(256) SET @MetadataDB = ‘ODS_METADATA’ Specify Database, Schema and Table name filters as needed. Example, to profile every table names starting with “Dim” then change the table filter to SET @TABLE_FILTER = ‘Dim%

    SET     @DATABASE_FILTER = ‘CustomerDB’

    SET  @SCHEMA_FILTER = ‘dbo’

    SET  @TABLE_FILTER = ‘%Customer%’

    SET  @COLUMN_FILTER = ‘%’

  5. -1_DataProfiling – Restart – Deletes and recreates all profiling tables

 Over at Actuality Business Intelligence LLC, we are offering a six month free trial of our Microsoft SSIS Data Profiling Data Flow Components completely unrestricted for SQL Server 2008R2/20012.

 

Complete the contact from below to receive a download link.

Posted in agile, Big Data, data profiling, extreme scoping, Extreme Scoping Agile SCRUM, Ira Whiteside, Kimball ETL (subsystem 1), Uncategorized | Leave a comment

Integrating Master Data Services Into The Enterprise

irawarrenwhiteside:

Excellent blog by Chris on the MDS underpinnings and Metadata.

Originally posted on Bluewater SQL:

A lot of what is typically discussed in regards to Master Data Services (MDS) revolves around either the Silverlight web-based interface or the Excel plug-in. What’s not often discussed are the back-end integrations scenarios for loading, extracting (querying) and the otherwise day-to-day management or reporting requirements a Master Data solution might involve. In this post, we will look at common integration scenarios and techniques for handling each within MDS.

Loading Data (Batch)

Out-of-the-box MDS has facilities to stage and easily load data in batch. For each entity (and hierarchy) you create, a staging table is also created. For leaf members the staging table takes the form of:

stg.<Entity Name>_Leaf

image

Data is then loaded directly to the table using a SSIS package or any other tool capable of writing to a SQL Server table. When all the data is loaded the batch can be started programmatically using calling a stored procedure…

View original 1,509 more words

Posted in Uncategorized | Leave a comment

On MDM, Data Models and Big Data

irawarrenwhiteside:

Interesting discussion on MDM , Logical Data Models and Big Data , hashing around approaches.

Originally posted on Liliendahl on Data Quality:

As described in the post Small Data with Big Impact my guess is that we will see Master Data Management solutions as a core element in having data architectures that are able to make sustainable results from dealing with big data.

If we look at party master data a serious problem with many ERP and CRM systems around is that the data model for party master data aren’t good enough for dealing with the many different forms and differences in which the parties we hold data about are represented in big data sources which makes the linking between traditional systems of record and big data very hard.

Having a Master Data Management (MDM) solution with a comprehensive data model for party master data is essential here.

Some of the capabilities we need are:

Storing multiple occurrences of attributes

People and companies have many phone numbers, they have many eMail addresses…

View original 226 more words

Posted in Uncategorized | Leave a comment

Ways of Sharing Master Data

Originally posted on Liliendahl on Data Quality:

The ”buy vs. build” option is well known within many disciplines not at least around your IT application stack. The trend here is that where you in the old times did a lot of in-house programming today you tend to buy more and more stuff to prevent reinventing wheel. Yesterday there was a post on that on Informatica Perspectives. The post is called Stop The Hand-Coding Madness!.

We certainly also see that trend when it comes to Master Data Management (MDM) solutions. And my guess is that we will see that trend too when it comes to the master data itself.

What has puzzled me over the years is how a lot of organizations spend time on and makes their personal errors when they type in the name, address and other core data about individuals and companies they do business with or alternatively letting us business partners type in…

View original 436 more words

Posted in Uncategorized | Leave a comment