Low cost strategic Business Analytics and Data Quality
This paper will outline the advantages of developing your complete Business Analytics solution within the Microsoft suite of Business Intelligence capabilities, including Microsoft SQL Server(SSIS, SSAS, SSRS and Microsoft Office).
This approach will yield and very powerful and detailed Analytical application focused on your business and it particular needs both data sources and business rules. In addition to a reporting and analytical solution, you will have insight and access to the data and business rule “Lineage” or DNA allowing you click on any anomalies in any dash board and not only drill into the detail, but also see how data was transformed and/or changes as it was process to create your dashboard, excel workbook or report.
Lineage as it applies to data in Business Intelligence provides the capability see track back where data came from. Essentially it supports “Metric Decomposition” which is a process for breaking down a metric (business calculation) into separate parts. With this capability you can determine your data’s ancestry or Data DNA.
Why does this matter? The key to Lower Cost of Ownership, that’s why.
An “Absolute Truth” in today’s environment (Business Discovery, Big Data, Self Service Whatever?) is that you never have enough detail and usually have to stop clicking before you have an answer to your question and look at another report of application. Just when you about to get to the “bottom of it” you run out of clicks As Donald Farmer (VP Product Management) pointed out recently we are hunters and that’s why we like search engines. We enter phrases for things were hunting, and the search engine presents large list which we then follow the “tracks” of what we ae looking for , if we go down a wrong path we back track and continue, continually getting closer to our answer. Search engines implement lineage thru “key words “ and there relationship to web pages, which intern link to other pages. This capability is now “heuristic” or common sense.
While the method of analyzing data is “Rule of Thumb” for internet searches, it is not in terms of analyzing your business reports.
If you were able to “link” all your data from it the original source to the final destination, then would you have the ability to “hunt” through your data the way you hunt with Google, Bing etc…
Let apply this concept to customer records with phone numbers (phone1 column). First we clean the phone number (phone1Cleansed column). We keep both columns in our processing results. Then in addition we keep the name of the “Business Rule we used to clean it (phone1_Category), an indicator to identify valid or invalid phone numbers(phone1_Valid) and the actual column name used as a source(phone1_ColumnName). The “phone1_Rule” column is used to tell us which rule was used, the first row used only the Phone Parse rule to format the number, the second row used two rules, NumbersOnly tells us thet the number contains non numeric character and the Phone Parse rule formats the number with periods separating the area code, prefix and suffix.
The end result is when this is included in your reporting or analytical solution you know:
- Who: The process that applied the change, if you also log package name(Optional)
- What: The final corrected or standardized value(phone1Cleansed)
- Where: Which column originated the value.(phone1_ColumnName)
- When: The date and time of the change.(Optional)
- Why: Which rules were invoked to cause the transformation.(phone1_Category, phone1_Rule and phone1_Valid)
In addition if you then choose to implement data mining and/or “Predictive Analytics” already a part of the Microsoft BI Suite, you will be facing the typical pitfalls associated with trying to predict based on either “dirty data” or highly “scrubbed data”, lacking lineage.
I have assembled a collection of vendors product that when used in conjunction with the Microsoft BI suite can provide this capability at reasonable in line with your investment with Microsoft SQL Server.
If done manually this would require extensive additional coding, however with the tools we have selected, this can be accomplished automatically as well as automating the loading of an Analysis Services cube for Analytics.
Our solution brings together the Microsoft offerings of Actuality Business Intelligence, Melissa Data and HaloBI. This solution can be implemented by any moderately skilled Microsoft developer familiar with the BI Suite(SSIS and SSAS) and does not require any expensive niche ETL or Analytical software.
In the next post I will walk through a real world implementation.
Microsoft SSIS Package