Category Archives: Awarement
Absolutely amazing to me how spot on both 1984 and a brave in the world books from the 1950s describe exactly what’s happening today with our current leadership￼
Misinformation is now our reality
Awarenent compare and contrast was described in this book to what’s happening today under the current leadership￼
Data Governance Navigating the Information Value Chain Demystifying the path forward
The challenge for businesses is to seek answers to questions, they do this with Metrics (KPI’s) and know the relationships of the data, organized by logical categories(dimensions) that make up the result or answer to the question. This is what constitutes the Information Value Chain
Let’s assume that you have a business problem, a business question that needs answers and you need to know the details of the data related to the business question.
Information Value Chain
Information Value Chain
- Business is based on Concepts.
- People thinks in terms of Concepts.
- Concepts come from Knowledge.
- Knowledge comes from Information.
- Information comes from Formulas.
- Formulas determine Information relationships based on quantities.
- Quantities come from Data.
- Data physically exist.
In today’s fast-paced high-tech business world this basic navigation (drill thru) business concept is fundamental and seems to be overlooked, in the zeal to embrace modern technology
In our quest to embrace fresh technological capabilities, a business must realize you can only truly discover new insights when you can validate them against your business model or your businesses Information Value Chain, that is currently creating your information or results.
Today data needs to be deciphered into information in order to apply formulas to determine relationships and validate concepts, in real time.
We are inundated with technical innovations and concepts it’s important to note that business is driving these changes not necessarily technology
Business is constantly striving for a better insights, better information and increased automation as well as the lower cost while doing these things several of these were examined and John Thuma’s‘ latest article
Historically though these changes were few and far between however innovation in hardware storage(technology) as well as software and compute innovations have led to a rapid unveiling of newer concepts as well as new technologies
Demystifying the path forward.
In this article we’re going to review the basic principles of information governance required for a business measure their performance. As well as explore some of the connections to some of these new technological concepts for lowering cost
To a large degree I think we’re going to find that why we do things has not changed significantly it’s just how, we know have different ways to do them.
It’s important while embracing new technology to keep in mind that some of the basic concepts, ideas, goals on how to properly structure and run a business have not changed even though many more insights and much more information and data is now available.
My point is in the implementing these technological advances could be worthless to the business and maybe even destructive, unless they are associated with a actual set of Business Information Goals(Measurements KPI’s) and they are linked directly with understandable Business deliverables.
And moreover prior to even considering or engaging a data science or attempt data mining you should organize your datasets capturing the relationships and apply a “scoring” or “ranking” process and be able to relate them to your business information model or Information Value Chain, with the concept of quality applied real time.
The foundation for a business to navigate their Information Value Chain is an underlying Information Architecture. An Information Architecture typically, involves a model or concept of information that is used and applied to activities which require explicit details of complex information systems.
Subsequently a data management and databases are required, they form the foundation of your Information Value Chain, to bring this back to the Business Goal. Let’s take a quick look at the difference between relational database technology and graph technology as a part of emerging big data capabilities.
However, considering the timeframe for database technology evolution, has is introduced a cultural aspect of implementing new technology changes, basically resistance to change. Business that are running there current operations with technology and people form the 80s and 90s have a different perception of a solution then folks from the 2000s.
Therefore, in this case regarding a technical solution “perception is not reality”, awarement is. Business need to find ways to bridge the knowledge gap and increase awarement that simply embracing new technology will not fundamentally change the why a business is operates , however it will affect how.
Relational databases were introduced in 1970, and graph database technology was introduced in the mid to 2000
There are many topics included in the current Big Data concept to analyze, however the foundation is the Information Architecture, and the databases utilized to implement it.
There were some other advancements in database technology in between also however let’s focus on these two
In a 1970s relational database, Based on mathematical Set theory, you could pre-define the relationship of tabular (tables) , implement them in a hardened structure, then query them by manually joining the tables thru physically naming attributes and gain much better insight than previous database technology however if you needed a new relationship it would require manual effort and then migration of old to new , In addition your answer it was only good as the hard coding query created
In mid-2000’s the graph database was introduced , based on graph theory, that defines the relationships as tuples containing nodes and edges. Graphs represent things and relationships events describes connections between things, which makes it an ideal fit for a navigating relationship. Unlike conventional table-oriented databases, graph databases (for example Neo4J, Neptune) represent entities and relationships between them. New relationships can be discovered and added easily and without migration, basically much less manual effort.
Nodes and Edges
Graphs are made up of ‘nodes’ and ‘edges’. A node represents a ‘thing’ and an edge represents a connection between two ‘things’. The ‘thing’ in question might be a tangible object, such as an instance of an article, or a concept such as a subject area. A node can have properties (e.g. title, publication date). An edge can have a type, for example to indicate what kind of relationship the edge represents.
The takeaway there are many spokes on the cultural wheel, in a business today, encompassing business acumen, technology acumen and information relationships and raw data knowledge and while they are all equally critical to success, the absolute critical step is that the logical business model defined as the Information Value Chain is maintained and enhanced.
It is a given that all business desire to lower cost and gain insight into information, it is imperative that a business maintain and improve their ability to provide accurate information that can be audited and traceable and navigate the Information Value Chain Data Science can only be achieved after a business fully understand their existing Information Architecture and strive to maintain it.
Note as I stated above an Information Architecture is not your Enterprise Architecture. Information architecture is the structural design of shared information environments; the art and science of organizing and labelling websites, intranets, online communities and software to support usability and findability; and an emerging community of practice focused on bringing principles of design, architecture and information science to the digital landscape. Typically, it involves a model or concept of information that is used and applied to activities which require explicit details of complex information systems.
In essence, a business needs a Rosetta stone in order translate past, current and future results.
In future articles we’re going to explore and dive into how these new technologies can be utilized and more importantly how they relate to all the technologies.
Merry Christmas Data Classification, Feature Engineering , Data Governance. ‘How to’ do it and some code take a look
I was heavily involved in business intelligence, data warehousing and data governance as of several years ago and recently have had many chaotic personal challenges, upon returning to professional practice I have discovered things have not changed that much in 10 yearsagovernance The methodologies and approaches are still relatively consistent however the tools and techniques have changed and In my opinion not for the better, without focusing on specific tools I’ve observed that the core to data or MDM is enabling and providing a capability for classifying data into business categories or nomenclature.. and it has really not improved.
- This basic traditional approach has not changed, in essence man AI model predicst a Metric and is wholly based on the integrity of its features or Dimensions.
Therefore I decided, to update some of the techniques and code patterns, I’ve used in the past regarding the information value chain and or record linkage , and we are going to make the results available with associated business and code examples initially with SQL Server and data bricks plus python
My good friend, Jordan Martz of DataMartz fame has greatly contrinuted to this old mans BigData enlightenment as well as Craig Campbell in updating some of the basic classification capabilities required and critical for data governance. If you would like a more detailed version of the source as well as the test data, please send me an email at firstname.lastname@example.org. Stay tuned for more update and soon we will add Neural Network capability for additional automation of “Governance Type” automated classification and confidence monitoring.
Before we focus on functionality let’s focus on methodology
Initially understand key metrics to be measured/KPI‘s their formulas and of course teh businesse’s expectation of their calculations
Immediately gather file sources and complete profiling as specified in my original article found here
Implementing the processes in my meta-data mart article would provide numerous statistics regarding integers or float field however there are some special considerations for text fields or smart codes
Before beginning classification you would employ similarity matching or fuzzy matching as described here
As I said I posted the code for this process on SQL Server Central 10 years ago here is s Python Version.
databricks-logo Roll You Own – Python Jaro_Winkler(Python)
databricks-logoroll You Own – Python Jaro_Winkler(Python)
Step 1a – import pandas
Step 2 – Import Libraries
libraries from pyspark.sql.functions import input_file_name
from pyspark.sql.types import *
import datetime, time, re, os, pandas
from pyspark.ml.feature import RegexTokenizer, StopWordsRemover, NGram, HashingTF, IDF, Word2Vec, Normalizer, Imputer, VectorAssembler
from pyspark.ml import Pipeline
from mlflow.tracking import MLFlowClient
from sklearn.cluster import KMeans
import numpy as np
Step 3 – Test JaroWinkler
Step 4a =Implement JaroWinkler(Fuzzy Matching)
def JaroWinkler(str1_in, str2_in):
if(str1_in is None or str2_in is None):
df_temp_table1 = pandas.DataFrame(columns=column_names)
df_temp_table2 = pandas.DataFrame(columns=column_names)
if len_str1 > len_str2:
while(iCounter <= len_str1):
while (iCounter <= len_str2):
while(i <= len_str1):
if m >= i: f=1 z=i+m else: f=i-m z=i+m if z > max_len: z=max_len while (f <= z): a2=str2_in[int(f-1)] if(a2==a1 and df_temp_table2.loc[f-1].at['FStatus']==0): common = common + 1 df_temp_table1.at[i-1,'FStatus']=1 df_temp_table2.at[f-1,'FStatus']=1 break f=f+1 i=i+1
while(i <= len_str1): v1Status=df_temp_table1.loc[i-1].at[‘FStatus’] if(v1Status==1): while(z <= len_str2): v2Status=df_temp_table2.loc[z-1].at[‘FStatus’] if(v2Status==1): a1=str1_in[i-1] a2=str2_in[z-1] z=z+1 if(a1 != a2 ): tr = tr+0.5 break break i=i+1 wcd = 1.0/3.0 wrd = 1.0/3.0 wtr = 1.0/3.0 if (common != 0): jaro_value = (wcd * common)/ len_str1 + (wrd * common) / len_str2 + (wtr * (common – tr)) / common return round(jaro_value,6) Step 4b – Register JaroWinkler spark.udf.register(“JaroWinkler”, JaroWinkler) Out:
Step8a – Bridge vs Master vs AssociativeALL
DROP TABLE IF EXISTS NameAssociative;
CREATE TABLE NameAssociative;
,sha2(replace ( NameLookput,’%[^a-Z0-9, ]%’,’ ‘) , 256) as NameLookupCeaned ,a.NameLookupKey
,sha2(replace( NameInput,’%[^a-Z0-9, ]%’,’ ‘) , 256) as NameInput,b.NameInputKey
,JaroWinkler(a.NameLookup, b.NameInput) MatchScor
,RANK() OVER (Partition by a.DetailedBUMaster ORDER BY JaroWinkler(a.NameLookupCleande, b.NameInputCleaned) DESC) NameLookup,b.NameLookupKey
FROM NameInput as a
CROSS JOIN NameLookup as b
Organic Data Quality vs Machine Data Quality
In reading a few recent blog post on centralized data quality vs. distributed data quality have provoked me to offer another point of view.
Organic distributed data quality provides todays awareness for all current source analysis and problem resolution and they are accomplished manually by individuals in most cases.
In many cases when management or leadership(Folks worried about their jobs) are presented with any type of organized machine based data quality results that can easily be viewed, understood and is permanently available, the usual result is to kill or discredit the messenger.
Automated “Data Quality Scoring” (Profiling + Metadata Mart) brings them from a general “awareness” to a realized state “awarement”.
Awarement is the established form of awareness. Once one has accomplished their sense of awareness they have come to terms with awarement.
It’s one thing to know that alcohol can get you drunk, it quite another to be aware that you are drunk when you are drunk.
Perception = Perception
Awarement = Reality
Departmentral DQ = Awareness
Centralized DQ = Awarement
Share the love… of Data Quality– A distributed approach to Data Quality may result in better data outcomes
My Wife, My Family , My Stroke (Perception is Perception:Awarement is Reality)
Six years earlier the doctors had told my wife(Theresa) she had 18 months to live, she was diagnosed Stage 4 cancer. She is now is full remission. A women like her mother, she never drank and never smoked.
On June 3, 2014 I had a brain stem stroke(PONS)
I have always told my daughters “Men are idiots”, and apparently I have proven it, here is an example:
5:00 Monday evening my arm start twitching and doesn’t stop.
7:00 Monday arm still twitching, my wife(beautiful, wonderful, courageous and wise) Tessie, says “Let me take you to the hospital”, of course like any normal “man” I say “Nah it no big deal”
11:00 PM Monday. I take a conference call and notice my left arm and leg are a little numb, obviously I take an aspirin and go back to bed.
5:00 AM Tuesday I plan to catch flight , my wife intervenes and I stay home.
7:00 AM Tuesday I take another conference call, by now I am limping and can’t hold a cup of coffee in my left hand.
8:00 AM Tuesday My wife now says we are going to the hospital.
8:30 AM Tuesday in hospital I am informed I had a Pons (Brain Stem Stroke) I can now no longer move my left leg or arm and have pretty slurry speech. As a bonus the doctors tell me if I had come in within 5 hours of the first symptom , they may have been able to help, but now it too late.
10:00 PM Thursday My daughter Christina notices the nurses had taken off my bracelet and then had administered a second injection of Lantus(Long Acting Insulin) , unfortunately I didn’t focus on what she pointed out. I would regret not listening.
11:00 PM Thursday I pass out in a diabetic coma in front of my wife and daughter(Victoria), after frantic activity I end up in the ICU, back to square one.
Many illnesses happen in slow motion, this episode of unconsciousness(coma) came very suddenly, if it were not for Tessie’s calmness , control , faith in God and knowledge I am sure I would not be here.
The nurse were unsure what to do , apparently they had given me an extra dose of insulin and very slow to respond.
My wife Theresa(Tessie) with Victoria’s help literally had to keep me alert and practically drag me to the ICU over the nurses objections. You know its bad when the nurses are asking you wife for the “Power of Attorney”, which of course she had already given them, and they had lost it.
…..25 days later (Diabetics Coma, ICU, Rehabilitation, multiple brain scans , seizure scans, seizures, diarrhea) I am released on the condition I use a walker. Since then I have graduated to a cane,. My wife had stayed with me 25 days the entire time.
I am very lucky I am left with hemiparesis(weakness left side) and double vision(right side)
For 25 days …….. , 25 DAYS …….. my wife(Theresa) and my entire family never left my side(day of night):
Theresa(Wife, Soul Mate, My everything), Christina(Daughter #1, The Rock). Victoria(Daughter #2, Brainiac ),Julia(Grandaughter #1, Angel sent from heaven),Brandon(Son in Law),Jack (Grandson #1),Jacob(Grandson #2)
Tessie and our family at St Mary’s in earlier times.
As a man, husband, father and grandfather I have tried to be up to the tasks in providing for my family, sometimes meeting them many times falling short, in my eyes.
My wife led a family effort that as I write about it and remember it I am overwhelmed and tearful. I wish I could say I deserve this , but I can’t I was not good enough.
The reality is that we grow older we grow wiser, not through knowledge alone, but through experience.
Apparently Pavlov was “more right” then Plato(Allegory of the Cave). Perception is not reality, Awarement is reality and it come through pain.
This experience has given me the opportunity to eliminate any remnants of ego and permanently instilled a sense of humility. Countless time during my 25 day stay, my family(ALL OF THEM) helped my dress, eat , go to the bathroom(with diarrhea).
As I said one point I went into a coma , low insulin and needed to receive emergency treatment and go to the ICU, it was one many very traumatic experiences and my wife Tessie never faltered, kept her faith and strength and brought me back.
Epilogue: At one point I had to prove my cognitive skills were still intact, being that my wife and daughters have a small consulting company Actuality Business Intelligence we collaborated on creating an assessment and power point on how to improve the rehabilitation scheduling system and patient/therapist flow, from my perspective anything that made rehab like work was good.
To be clear my hospital room was family “Grand Central Station”
This post is dedicated to you , I love you with all my heart and soul, forever and ever. God has truly blessed me with you as my soul mate and to receive your love and care.
Your eternally grateful husband – Ira
Ira Warren Whiteside
Guerilla MDM via Microsoft MDS: Force Model Validation or TSQL Script to force set Validation Status ID ‘s
Using Metadata and code generation to programmatically set Validation Status ID’s for all leaf members in an MDS Entity.
Primarily this script relies on the MDS mdm.udpMemberValidationStatusUpdate procedure for a single member. Unfortunately I could not get the stored proc for multiple members(mdm.udpMembersValidationStatusUpdate) to work, so I created a script to rely on the single member version.( mdm.udpMemberValidationStatusUpdate). I could not resolve the “operand type clash” error.
The script requires the model name and entity name, to generate a TSQL statement to update each member id to the designated Validation status.
IF OBJECT_ID('tempdb..#MemberIdList') IS NOT NULL
DROP TABLE #MemberIdList
DECLARE @ModelName nVarchar(50) = ‘Supplier’
DECLARE @Model_id int
DECLARE @Version_ID int
DECLARE @Entity_ID int
DECLARE @Entity_Name nVarchar(50) = ‘Supplier’
DECLARE @Entity_Table nVarchar(50)
DECLARE @sql nVarchar(500)
DECLARE @sqlExec nVarchar(500) = ‘EXEC mdm.udpMemberValidationStatusUpdate ‘
DECLARE @ValidationStatus_ID int = 4
— Found the following information intable [mdm].[tblList]
–ListCode ListName Seq ListOption
–lstValidationStatus ValidationStatus 0 New, Awaiting Validation
–lstValidationStatus ValidationStatus 1 Validating
–lstValidationStatus ValidationStatus 4 Validation Failed
–lstValidationStatus ValidationStatus 3 Validation Succeeded
–lstValidationStatus ValidationStatus 2 Awaiting Revalidation
–lstValidationStatus ValidationStatus 5 Awaiting Dependent Member Revalidation
–MemberType_ID = 1 (Leaf Member)
DECLARE @MemberType_ID int = 1
–Get Version id for Model
SET @Version_ID = (SELECT MAX(ID)
WHERE Model_Name = @ModelName)
–Get Model IDfor Model
SET @Model_ID = (SELECT Model_ID
WHERE Model_Name = @ModelName)
–Get Entity ID for specific Entity
SET @Entity_ID =
where [Model_ID] = @Model_ID
and [Name]= @Entity_Name)
–Get Entity Table Namefor specific Entity
SET @Entity_Table =
where [Model_ID] = @Model_ID
and [Name]= @Entity_Name)
print ‘Processing Following Model ‘ + convert(varchar,@ModelName)
print ‘Model ID = ‘ + convert(varchar,@Model_id)
print ‘Version ID = ‘ + convert(varchar,@Version_ID)
print ‘Entity = ID ‘ + convert(varchar,@Entity_ID)
print ‘Entity Name = ‘ + convert(varchar,@Entity_Name)
print ‘Entity Table Name ‘ + convert(varchar,@Entity_Table)
print ‘Validation Status ID being set to ‘ + convert(varchar,@ValidationStatus_ID) + ‘ for all members in ‘+ convert(varchar,@Entity_Name)
–Create local temp table to hold member ids to update
CREATE TABLE #MemberIdList
( id int,
–Generate SQL to populaet temp table
set @sql = N’INSERT INTO #MemberIdList select id, 0 from mdm.’ +convert(varchar,@Entity_Table)
–Generate SQL to display stored proc update
EXECUTE sp_executesql @sql
–Create Tabe Variable to hold SQL Commands and prepare for execute
Declare @Id int
DECLARE @MemberSQL TABLE
While (Select Count(*) From #MemberIdList Where Processed = 0) > 0
Select Top 1 @Id = ID From #MemberIdList Where Processed = 0
set @sql = N’ mdm.udpMemberValidationStatusUpdate ‘ +convert(varchar,@Version_ID) +’,’ +convert(varchar, @Entity_ID) +’,’ +convert(varchar,@ID) +’,’ +convert(varchar, @MemberType_ID) +’,’ + convert(varchar,@ValidationStatus_ID)
INSERT INTO @MemberSQL
Select @id , @sql, 0
Update #MemberIdList Set Processed = 1 Where ID = @Id
Declare @sqlsyntax nVarchar(500)
While (Select Count(*) From @MemberSQL Where Processed = 0) > 0
Select Top 1 @id = id From @MemberSQL Where Processed = 0
Select Top 1 @sqlsyntax = mdssql From @MemberSQL Where Processed = 0
— comment this line to not execute update Validation Status ID
EXECUTE sp_executesql @sqlsyntax
Update @MemberSQL Set Processed = 1 Where ID = @Id
Declare @Entity_TableSQL nVarchar(500)
set @Entity_TableSQL = ‘SELECT * FROM [mdm].’ + @Entity_Table
exec sp_executesql @Entity_TableSQL
Here are several links I referenced:
The code for forcing the Validation of the Model is here. Jeremey Kashel’s Blog
Microsoft SQL Server 2012 Master Data Services 2/E Tyler Graham
I am sure this can be improved, please contact me with questions or suggestions.
-Ira Warren Whiteside
Agile MDM Data Modeling and Requirements incorporating a Metadata Mart.(Guerilla MDM)
Recently an acquaintance of mine asked about my thoughts on the approach for creating a MDM data model and the requirements artifacts for loading MDM.
I believe with this client and with most clients that the first thing that we have to help them understand is that they have to take an organic and evolving a.k.a. an “agile” approach utilizing Extreme Scoping in developing and fleshing out their MDM data model as well as the requirements.
High-level logical data models for customer, location, product etc, such as Open Data Model are readily available and generally not extremely customized or different.
If we look at the MDM aspect of data modeling and requirements, I look at it as three layers:
1.the logical data model.
2.the sub models(reference models) which detail the localization in the sources for each domain, which in essence is our source-based reference models.
3.an associative model that allows for the nexus/mapping between the sources their domain values in the master domain model.
It doesn’t really get interesting until you’re able to dive into the submodel and associative models and relationships creation and understand the issues that you’re dealing with and anomalies in the source data, which inherently involves quite a bit of data profiling and metadata analysis.
In my experience the creation of the MDM model and sub models involves three simultaneous and parallel tracks.
1. Meet with users of and preform metric decomposition(defined in my slides) and define the (information value chain) and create logical definitions of metrics , groupings of dimensions and hire attributes and subsequently hierarchies that are required from an analysis perspective. A data dictionary.
1a. The business deliverable here is a business matrix which will show the relationship between the metrics and the dimensions in relation to the business processes and/or KPI’s. I’ll send an example.
1b. The relationship of the above deliverable to the logical model, is that the business deliverables here specifically the business matrix and the associated hierarchies will drive the core required data elements from the KPI and therefore the MDM perspective. In addition it will drive the necessary hierarchies( relationships) and thereby the associations that will be required at the next level down which will involve the source to target mapping required for loading the reference tables.
1c. Lastly and once we’ve truly understand the clients requirements with enough detail in terms of metric definition and hierarchies we can then select and incorporate reference models or industry experts and validating and vetting our model, this should be relatively straightforward.
1d. From an MDM perspective, where multiple teams are simultaneously gathering information/requirements we would want to use and have access to the artifacts especially for new business process modeling or any BI requirements modeling.
2. Introduce the metadata mart concept and utilize local techniques and published capabilities(I have several that can easily be incorporated) to instantiate a metadata mart and incorporate metadata and data profiling as part of the data modeling process and user awarement process.
2a. The deliverable here is a set of Excel pivot tables and/or reports that allow the users to analyze and understand the sources for domains and how they’re going to relate to the master data model even at logical level
3. Capture relations to defining master domains and documenting in Source to Target Matrixs the required mappings and transformations.
This process will result in a comprehensive set of deliverables that while in their final state will be the logical and physical data models for MDM, in addition the client will have the necessary data dictionary definitions as well as I high-level source to target mappings.
My concern with presenting a high-level logical MDM model to user is that it is intuitively too simplistic and to straightforward. Obviously at the high-level the MDM constructs look very logical for customer, location, products etc..
They don’t expose the complexity and frankly the magic(hard work) from an MDM perspective of the underlining and supporting source reference models and associative models that are really the heart and soul of the final model. The sooner and more immediate that we can engage the client an organized! facilitated, methodological process of simultaneously profiling and understanding their existing data and defining the final state a.k.a. the MDM model that they are looking for the better we’re going to be in managing long-term expectations.
There’s an old saying which I think specifically applies here especially in relation to users and expectations it is “They get what they inspect, not what they expect
I know that this went on a bit long, I apologize I certainly don’t mean to lecture, but I think the crux of the problem in a waterfall or traditional approach to creating an MDM model is overcome by following a more iterative or agile approach.
The reality is that the MDM model is the heart of the integration engine.