if you wanna be healthy , eat well , no diet , eliminate disease (Alzheimer’s, Type 2 diabetes, kidney, heart, etc,,) . Period.)just watch this one video I’ve done it. none of these diseases existed in the 1800s they came after the. Civil War introduced processed foods from 1865 to 1910 this was a experiment without INFORMED CONSENT we are being poisoned We are led to believe that medical treatment has improved that is true however if he CONSIDER LESS women dying in childbirth and children’s / infant deaths we are not living longer and the and the increase in the rate of these diseases has increased exponentially in the last 100 years only. keep in mind the planet is 5 billion years old SEED OILS QRE KILLING US Safflower oil, Corn oil, Canola oil, cottonseed oil, grapeseed oil, sunflower oil
Diseases of civilization are cause by excessive Seedoil‘s Consumption
Dr. Chris Knobbe
Absolutely amazing to me how spot on both 1984 and a brave in the world books from the 1950s describe exactly what’s happening today with our current leadership￼
Awarenent compare and contrast was described in this book to what’s happening today under the current leadership￼
excellent science based, fact based presentation on where we came from which is realistically a blend of Darwinism and intelligent design we still don’t know but it looks to me like now we’re at least letting signs come out and there are many things we don’t understand excellent presentation #DNA #information governance #artificialintelligence #AI #Misinformation #Darwinism #Data #Informationvaluechain #Data governance
The challenge for businesses is to seek answers to questions, they do this with Metrics (KPI’s) and know the relationships of the data, organized by logical categories(dimensions) that make up the result or answer to the question. This is what constitutes the Information Value Chain
Let’s assume that you have a business problem, a business question that needs answers and you need to know the details of the data related to the business question.
Information Value Chain
Information Value Chain
- Business is based on Concepts.
- People thinks in terms of Concepts.
- Concepts come from Knowledge.
- Knowledge comes from Information.
- Information comes from Formulas.
- Formulas determine Information relationships based on quantities.
- Quantities come from Data.
- Data physically exist.
In today’s fast-paced high-tech business world this basic navigation (drill thru) business concept is fundamental and seems to be overlooked, in the zeal to embrace modern technology
In our quest to embrace fresh technological capabilities, a business must realize you can only truly discover new insights when you can validate them against your business model or your businesses Information Value Chain, that is currently creating your information or results.
Today data needs to be deciphered into information in order to apply formulas to determine relationships and validate concepts, in real time.
We are inundated with technical innovations and concepts it’s important to note that business is driving these changes not necessarily technology
Business is constantly striving for a better insights, better information and increased automation as well as the lower cost while doing these things several of these were examined and John Thuma’s‘ latest article
Historically though these changes were few and far between however innovation in hardware storage(technology) as well as software and compute innovations have led to a rapid unveiling of newer concepts as well as new technologies
Demystifying the path forward.
In this article we’re going to review the basic principles of information governance required for a business measure their performance. As well as explore some of the connections to some of these new technological concepts for lowering cost
To a large degree I think we’re going to find that why we do things has not changed significantly it’s just how, we know have different ways to do them.
It’s important while embracing new technology to keep in mind that some of the basic concepts, ideas, goals on how to properly structure and run a business have not changed even though many more insights and much more information and data is now available.
My point is in the implementing these technological advances could be worthless to the business and maybe even destructive, unless they are associated with a actual set of Business Information Goals(Measurements KPI’s) and they are linked directly with understandable Business deliverables.
And moreover prior to even considering or engaging a data science or attempt data mining you should organize your datasets capturing the relationships and apply a “scoring” or “ranking” process and be able to relate them to your business information model or Information Value Chain, with the concept of quality applied real time.
The foundation for a business to navigate their Information Value Chain is an underlying Information Architecture. An Information Architecture typically, involves a model or concept of information that is used and applied to activities which require explicit details of complex information systems.
Subsequently a data management and databases are required, they form the foundation of your Information Value Chain, to bring this back to the Business Goal. Let’s take a quick look at the difference between relational database technology and graph technology as a part of emerging big data capabilities.
However, considering the timeframe for database technology evolution, has is introduced a cultural aspect of implementing new technology changes, basically resistance to change. Business that are running there current operations with technology and people form the 80s and 90s have a different perception of a solution then folks from the 2000s.
Therefore, in this case regarding a technical solution “perception is not reality”, awarement is. Business need to find ways to bridge the knowledge gap and increase awarement that simply embracing new technology will not fundamentally change the why a business is operates , however it will affect how.
Relational databases were introduced in 1970, and graph database technology was introduced in the mid to 2000
There are many topics included in the current Big Data concept to analyze, however the foundation is the Information Architecture, and the databases utilized to implement it.
There were some other advancements in database technology in between also however let’s focus on these two
In a 1970s relational database, Based on mathematical Set theory, you could pre-define the relationship of tabular (tables) , implement them in a hardened structure, then query them by manually joining the tables thru physically naming attributes and gain much better insight than previous database technology however if you needed a new relationship it would require manual effort and then migration of old to new , In addition your answer it was only good as the hard coding query created
In mid-2000’s the graph database was introduced , based on graph theory, that defines the relationships as tuples containing nodes and edges. Graphs represent things and relationships events describes connections between things, which makes it an ideal fit for a navigating relationship. Unlike conventional table-oriented databases, graph databases (for example Neo4J, Neptune) represent entities and relationships between them. New relationships can be discovered and added easily and without migration, basically much less manual effort.
Nodes and Edges
Graphs are made up of ‘nodes’ and ‘edges’. A node represents a ‘thing’ and an edge represents a connection between two ‘things’. The ‘thing’ in question might be a tangible object, such as an instance of an article, or a concept such as a subject area. A node can have properties (e.g. title, publication date). An edge can have a type, for example to indicate what kind of relationship the edge represents.
The takeaway there are many spokes on the cultural wheel, in a business today, encompassing business acumen, technology acumen and information relationships and raw data knowledge and while they are all equally critical to success, the absolute critical step is that the logical business model defined as the Information Value Chain is maintained and enhanced.
It is a given that all business desire to lower cost and gain insight into information, it is imperative that a business maintain and improve their ability to provide accurate information that can be audited and traceable and navigate the Information Value Chain Data Science can only be achieved after a business fully understand their existing Information Architecture and strive to maintain it.
Note as I stated above an Information Architecture is not your Enterprise Architecture. Information architecture is the structural design of shared information environments; the art and science of organizing and labelling websites, intranets, online communities and software to support usability and findability; and an emerging community of practice focused on bringing principles of design, architecture and information science to the digital landscape. Typically, it involves a model or concept of information that is used and applied to activities which require explicit details of complex information systems.
In essence, a business needs a Rosetta stone in order translate past, current and future results.
In future articles we’re going to explore and dive into how these new technologies can be utilized and more importantly how they relate to all the technologies.
I was heavily involved in business intelligence, data warehousing and data governance as of several years ago and recently have had many chaotic personal challenges, upon returning to professional practice I have discovered things have not changed that much in 10 yearsagovernance The methodologies and approaches are still relatively consistent however the tools and techniques have changed and In my opinion not for the better, without focusing on specific tools I’ve observed that the core to data or MDM is enabling and providing a capability for classifying data into business categories or nomenclature.. and it has really not improved.
- This basic traditional approach has not changed, in essence man AI model predicst a Metric and is wholly based on the integrity of its features or Dimensions.
Therefore I decided, to update some of the techniques and code patterns, I’ve used in the past regarding the information value chain and or record linkage , and we are going to make the results available with associated business and code examples initially with SQL Server and data bricks plus python
My good friend, Jordan Martz of DataMartz fame has greatly contrinuted to this old mans BigData enlightenment as well as Craig Campbell in updating some of the basic classification capabilities required and critical for data governance. If you would like a more detailed version of the source as well as the test data, please send me an email at firstname.lastname@example.org. Stay tuned for more update and soon we will add Neural Network capability for additional automation of “Governance Type” automated classification and confidence monitoring.
Before we focus on functionality let’s focus on methodology
Initially understand key metrics to be measured/KPI‘s their formulas and of course teh businesse’s expectation of their calculations
Immediately gather file sources and complete profiling as specified in my original article found here
Implementing the processes in my meta-data mart article would provide numerous statistics regarding integers or float field however there are some special considerations for text fields or smart codes
Before beginning classification you would employ similarity matching or fuzzy matching as described here
As I said I posted the code for this process on SQL Server Central 10 years ago here is s Python Version.
databricks-logo Roll You Own – Python Jaro_Winkler(Python)
databricks-logoroll You Own – Python Jaro_Winkler(Python)
Step 1a – import pandas
Step 2 – Import Libraries
libraries from pyspark.sql.functions import input_file_name
from pyspark.sql.types import *
import datetime, time, re, os, pandas
from pyspark.ml.feature import RegexTokenizer, StopWordsRemover, NGram, HashingTF, IDF, Word2Vec, Normalizer, Imputer, VectorAssembler
from pyspark.ml import Pipeline
from mlflow.tracking import MLFlowClient
from sklearn.cluster import KMeans
import numpy as np
Step 3 – Test JaroWinkler
Step 4a =Implement JaroWinkler(Fuzzy Matching)
def JaroWinkler(str1_in, str2_in):
if(str1_in is None or str2_in is None):
df_temp_table1 = pandas.DataFrame(columns=column_names)
df_temp_table2 = pandas.DataFrame(columns=column_names)
if len_str1 > len_str2:
while(iCounter <= len_str1):
while (iCounter <= len_str2):
while(i <= len_str1):
if m >= i: f=1 z=i+m else: f=i-m z=i+m if z > max_len: z=max_len while (f <= z): a2=str2_in[int(f-1)] if(a2==a1 and df_temp_table2.loc[f-1].at['FStatus']==0): common = common + 1 df_temp_table1.at[i-1,'FStatus']=1 df_temp_table2.at[f-1,'FStatus']=1 break f=f+1 i=i+1
while(i <= len_str1): v1Status=df_temp_table1.loc[i-1].at[‘FStatus’] if(v1Status==1): while(z <= len_str2): v2Status=df_temp_table2.loc[z-1].at[‘FStatus’] if(v2Status==1): a1=str1_in[i-1] a2=str2_in[z-1] z=z+1 if(a1 != a2 ): tr = tr+0.5 break break i=i+1 wcd = 1.0/3.0 wrd = 1.0/3.0 wtr = 1.0/3.0 if (common != 0): jaro_value = (wcd * common)/ len_str1 + (wrd * common) / len_str2 + (wtr * (common – tr)) / common return round(jaro_value,6) Step 4b – Register JaroWinkler spark.udf.register(“JaroWinkler”, JaroWinkler) Out:
Step8a – Bridge vs Master vs AssociativeALL
DROP TABLE IF EXISTS NameAssociative;
CREATE TABLE NameAssociative;
,sha2(replace ( NameLookput,’%[^a-Z0-9, ]%’,’ ‘) , 256) as NameLookupCeaned ,a.NameLookupKey
,sha2(replace( NameInput,’%[^a-Z0-9, ]%’,’ ‘) , 256) as NameInput,b.NameInputKey
,JaroWinkler(a.NameLookup, b.NameInput) MatchScor
,RANK() OVER (Partition by a.DetailedBUMaster ORDER BY JaroWinkler(a.NameLookupCleande, b.NameInputCleaned) DESC) NameLookup,b.NameLookupKey
FROM NameInput as a
CROSS JOIN NameLookup as b
Ira Whiteside / Jordan Martz / Victoria Stasiewicz
Advanced Matching for Business Categories(Terms) to Technical Artifacts(Data) in Relation to Master Data Management Governance and or Feature Engineering in Artificial Intelligence Modeling preparation.
We will be covering a top down(Business) and bottoms up Business driven methodology and tactical capabilities to the critical issue of associating Business categories/ dimensions to each other and subsequently transactions. We will include templates and code (Python/SQL/Databricks,etc…)….