Category Archives: Data Governance

Data Governance – Navigating the Information Value Chain

The challenge for businesses is to seek answers to questions, they do this with Metrics (KPI’s) and know the relationships of the data, organized by logical categories(dimensions) that make up the result or answer to the question. This is what constitutes the Information Value Chain

Navigation

Let’s assume that you have a business problem, a business question that needs answers and you need to know the details of the data related to the business question.

Information Value Chain

 

  • Business is based on Concepts.
  • People thinks in terms of Concepts.
  • Concepts come from Knowledge.
  • Knowledge comes from Information.
  • Information comes from Formulas.
  • Formulas determine Information relationships based on quantities.
  • Quantities come from Data.
  • Data physically exist.

In today’s fast-paced high-tech business world this basic navigation (drill thru) business concept is fundamental and seems to be overlooked, in the zeal to embrace modern technology

In our quest to embrace fresh technological capabilities, a business must realize you can only truly discover new insights when you can validate them against your business model or your businesses Information Value Chain, that is currently creating your information or results.

Today data needs to be deciphered into information in order to apply formulas to determine relationships and validate concepts, in real time.

We are inundated with technical innovations and concepts it’s important to note that business is driving these changes not necessarily technology

Business is constantly striving for a better insights, better  information and increased automation as well as the lower cost while doing these things several of these were examined

Historically though these changes were few and far between however innovation in hardware storage(technology) as well as software and compute innovations have led to a rapid unveiling of newer concepts as well as new technologies     

Demystifying the path forward.

In this article we’re going to review the basic principles of information governance required for a business measure their performance. As well as explore some of the connections to some of these new technological concepts for lowering cost       

To a large degree I think we’re going to find that why we do things has not changed significantly it’s just how, we know have different ways to do them.

It’s important while embracing new technology to keep in mind that some of the basic concepts, ideas, goals on how to properly structure and run a business have not changed even though many more insights and much more information and data is now available.

My point is in the implementing these technological advances could be worthless to the business and maybe even destructive, unless they are associated with a actual set of Business Information Goals(Measurements KPI’s) and they are linked directly with understandable Business deliverables.

And moreover prior to even considering or engaging a data science or attempt data mining you should organize your datasets capturing the relationships and apply a “scoring” or “ranking” process and be able to relate them to your business information model or Information Value Chain, with the concept of quality applied real time.  

The foundation for a business to navigate their Information Value Chain is an underlying Information Architecture. An Information Architecture typically, involves a model or concept of information that is used and applied to activities which require explicit details of complex information systems.

Subsequently a data management and databases are required, they form the foundation of your Information Value Chain, to bring this back to the Business Goal. Let’s take a quick look at the difference between relational database technology and graph technology as a part of emerging big data capabilities.

However, considering the timeframe for database technology evolution, has is introduced a cultural aspect of implementing new technology changes, basically resistance to change. Business that are running there current operations with technology and people form the 80s and 90s have a different perception of a solution then folks from the 2000s. 

Therefore, in this case regarding a technical solution “perception is not reality awarement is”. Business need to find ways to bridge the knowledge gap and increase awarement that simply embracing new technology will not fundamentally change the why a business is operates , however it will affect how.

Relational databases were introduced in 1970, and graph database technology was introduced in the mid to 2000

There are many topics included in the current Big Data concept to analyze, however the foundation is the Information Architecture, and the databases utilized to implement it.

There were some other advancements in database technology in between also however let’s focus on these two

History

1970

In a 1970s relational database, Based on mathematical Set  theory, you could pre-define the relationship of tabular (tables)   , implement them in a hardened structure, then query  them by manually joining the tables thru physically naming attributes and gain much better insight than previous database technology however if you needed a new relationship it would require manual effort and then migration of old to new , In addition your answer it was only good as the hard coding query created

2020

In  mid-2000’s the graph database was introduced , based on graph theory, that defines the relationships as tuples  containing nodes  and edges.  Graphs represent things and relationships events describes connections between things, which makes it an ideal fit for a navigating relationship. Unlike conventional table-oriented databases, graph databases (for example Neo4J, Neptune) represent entities and relationships between them. New relationships can be discovered and added easily and without migration, basically much less manual effort. 

Nodes and Edges

Graphs are made up of ‘nodes’ and ‘edges’. A node represents a ‘thing’ and an edge represents a connection between two ‘things’. The ‘thing’ in question might be a tangible object, such as an instance of an article, or a concept such as a subject area. A node can have properties (e.g. title, publication date). An edge can have a type, for example to indicate what kind of relationship the edge represents.

Takeaway.

The takeaway there are many spokes on the cultural wheel, in a business today, encompassing business acumen, technology acumen and information relationships and raw data knowledge and while they are all equally critical to success, the absolute critical step is that the logical business model defined as the Information Value Chain is maintained and enhanced.

It is a given that all business desire to lower cost and gain insight into information, it is imperative that a business maintain and improve their ability to provide accurate information that can be audited and traceable and navigate the Information Value Chain Data Science can only be achieved after a business fully understand their existing Information Architecture and strive to maintain it.

Note as I stated above an Information Architecture is not your Enterprise Architecture or even Data Architecture Information Relationships it is the hierarchical design of shared information environments; the art and science of organizing and labelling gGossary terms, transactions to support usability and findability; in an emerging community of practice focused on bringing principles of design, architecture and information science to the digital landscape. Typically, it involves a model or concept of information that is used and applied to activities which require explicit details of complex information systems.

In essence, a business needs a Rosetta stone in order translate past, current and future results.

Rosetta Stone

In future articles we’re going to explore and dive into how these new technologies can be utilized and more importantly how they relate to all the technologies.

   

 

Chocolate cake, MDM, data quality, machine learning and creating the information value chain’

The primary take away from this article will be that you don’t start your Machine Learning project, MDM , Data Quality or Analytical project with “data” analysis, you start with the end in mind, the business objective in mind. We don’t need to analyze data to know what it is, it’s like oil or water or sand or flour.

Unless we have a business purpose to use these things, we don’t need to analyze them to know what they are. Then because they are only ingredients to whatever we’re trying to make. And what makes them important is to what degree they are part of the recipe , how they are associated

Business Objective: Make Desert

Business Questions: The consensus is Chocolate Cake , how do we make it?

Business Metrics: Baked Chocolate Cake

Metric Decomposition: What are the ingredients and portions?

2/3 cup butter, softened

1-2/3 cups sugar

3 large eggs

2 cups all-purpose flour

2/3 cup baking cocoa

1-1/4 teaspoons baking soda

1 teaspoon salt

1-1/3 cups milk

Confectioners’ sugar or favorite frosting

So here is the point you don’t start to figure out what you’re going to have for dessert by analyzing the quality of the ingredients. It’s not important until you put them in the context of what you’re making and how they relate in essence, or how the ingredients are linked or they are chained together.

In relation to my example of desert and a chocolate cake, an example could be, that you only have one cup of sugar, the eggs could’ve set out on the counter all day, the flour could be coconut flour , etc. etc. you make your judgment on whether not to make the cake on the basis of analyzing all the ingredients in the context of what you want to, which is a chocolate cake made with possibly warm eggs, cocunut flour and only one cup of sugar.

Again belaboring this you don’t start you project by looking at a single entity column or piece of data, until you know what you’re going to use it for in the context of meeting your business objectives.

Applying this to the area of machine learning, data quality and/or MDM lets take an example as follows:

Business Objective: Determine Operating Income

Business Questions: How much do we make, what does it cost us.

Business. Metrics: Operating income = gross income – operating expenses – depreciation – amortization.

Metric Decomposition: What do I need to determine a Operating income?

Gross Income = Sales Amount from Sales Table, Product, Address

Operating Expense = Cost from Expense Table, Department, Vendor

Etc…

Dimensions to Analyze for quality.

Product

Address

Department

Vendor

You may think these are the ingredients for our chocolate cake in regards to business and operating income however we’re missing one key component, the portions or relationship, in business, this would mean the association,hierarchy or drill path that the business will follow when asking a question such as why is our operating income low?

For instance the CEO might first ask what area of the country are we making the least amount of money?

After that the CEO may ask well in that part of the country, what product is making the least amount of money and who manages it, what about the parts suppliers?

Product => Address => Department => Vendor

Product => Department => Vendor => Address

Many times these hierarchies, drill downs, associations or relationships are based on various legal transaction of related data elements the company requires either between their customers and or vendors.

The point here is we need to know the relationships , dependencies and associations that are required for each business legal transaction we’re going to have to build in order to link these elements directly to the metrics that are required for determining operating income, and subsequently answering questions about it.

No matter the project, whether we are preparing for developing a machine learning model, building an MDM application or providing an analytical application if we cannot provide these elements and their associations to a metric , we will not have answered the key business questions and will most likely fail.

The need to resolve the relationships is what drives the need for data quality which is really a way of understanding what you need to do to standardize your data. Because the only way to create the relationships is with standards and mappings between entities.

The key is mastering and linking relationships or associations required for answering business questions, it is certainly not just mastering “data” with out context.

We need MASTER DATA RELATIONSHIP MANAGEMENT

not

MASTER DATA MANAGEMENT.

So final thoughts are the key to making the chocolate cake is understanding the relationships and the relative importance of the data/ingredients to each other not the individual quality of each ingredient.

This also affects the workflow, Many inexperienced MDM Data architects do not realize that these associations form the basis for the fact tables in the analytical area. These associations will be the primary path(work flow) the data stewards will follow in performing maintenance , the stewards will be guided based on these associations to maintain the surrounding dimensions/master entities. Unfortunately instead some architects will focus on the technology and not the business. Virtually all MDM tools are model driven APIs and rely on these relationships(hierarchies) to generate work flow and maintenance screen generation. Many inexperienced architects focus on MVP(Minimal Viable Product), or technical short term deliverable and are quickly called to task due to the fact the incurred cost for the business is not lowered as well as the final product(Chocolate Cake) is delayed and will now cost more.

Unless the specifics of questionable quality in a specific entity or table or understood in the context of the greater business question and association it cannot be excluded are included.

An excellent resource for understanding this context can we found by following: John Owens

Final , final thoughts, there is an emphasis on creating the MVP(Minimal Viable Product) in projects today, my take is in the real world you need to deliver the chocolate cake, simply delivering the cake with no frosting will not do,in reality the client wants to “have their cake and eat it too”.

Note:

Operating Income is a synonym for earnings before interest and taxes (EBIT) and is also referred to as “operating profit” or “recurring profit.” Operating income is calculated as: Operating income = gross incomeoperating expenses – depreciation – amortization.