We want to build an Enterprise Analytical capability by integrating the concepts for building a Metadata Mart with the facilities for the Semantic Web
- Metadata Mart Source
- (Metadata Mart as is) Source Profiling(Column, Domain & Relationship)
- +
- (Metadata Mart Plus Vocabulary(Metadata Vocabulary)) Stored as Triples(subject-predicate-object) (SSIS Text Mining)
- +
- (Metadata Mart Plus)Create Metadata Vocabulary following RDFa applied to Metadata Mart Triple(SSIS Text Mining+ Fuzzy (SPARGL maybe))
- +
- Bridge to RDFa – JSON-LD via Schema.org
- Master data Vocabulary with lineage (Metadata Vocabulary + Master Vocabulary) mapped to MetaContent Statements)) based on person.schema.org
- Creates link to legacy data in data warehouse
- +RDFa applied to web pages
- +JSON-LD applied to
- + any Triples from any source
- Semantic Self Service BI
- Metadata Mart Source + Bridge to RDFa
I have spent some time in this for quite a while now and I believe there is a quite a bit of merit in approaching the collection of domain data and column profile data, in regards to the meta-data mart, and organize them in a triple’s fashion
The basis for JSON-LD and RDFa is the collection of data as a triple. Delving into said deeper
I believe with the proper mapping for the object reference and deriving of the appropriate predicates in the collection of the value we could gain some of the same benefits as well as bringing the web data being “collected, there by linking to source data.
Consider the following excerpt regarding Vocabularies derived from MetaContent via Metadata Structure
“Metadata structures[edit]
Metadata (metacontent), or more correctly, the vocabularies used to assemble metadata (metacontent) statements, are typically structured according to a standardized concept using a well-defined metadata scheme, including: metadata standards and metadata models. Tools such as controlled vocabularies, taxonomies, thesauri, data dictionaries, and metadata registries can be used to apply further standardization to the metadata. Structural metadata commonality is also of paramount importance in data model development and in database design.
Metadata syntax[edit]
Metadata (metacontent) syntax refers to the rules created to structure the fields or elements of metadata (metacontent).[11] A single metadata scheme may be expressed in a number of different markup or programming languages, each of which requires a different syntax. For example, Dublin Core may be expressed in plain text, HTML, XML, and RDF.[12]
A common example of (guide) metacontent is the bibliographic classification, the subject, the Dewey Decimal class number. There is always an implied statement in any “classification” of some object. To classify an object as, for example, Dewey class number 514 (Topology) (i.e. books having the number 514 on their spine) the implied statement is: “<book><subject heading><514>. This is a subject-predicate-object triple, or more importantly, a class-attribute-value triple. The first two elements of the triple (class, attribute) are pieces of some structural metadata having a defined semantic. The third element is a value, preferably from some controlled vocabulary, some reference (master) data. The combination of the metadata and master data elements results in a statement which is a metacontent statement i.e. “metacontent = metadata + master data”. All these elements can be thought of as “vocabulary”. Both metadata and master data are vocabularies which can be assembled into metacontent statements. “
The MetadataMart serve as the source for both metadata vocabulary and MDM for the Master Data Vocabulary.
For the Master Data Vocabulary consider schema.org which defines most of the schemas we need. Consider the following schema.org Persons Properties of Objects and Predicates:
A person (alive, dead, undead, or fictional).
Property | Expected Type | Description |
Properties from Person | ||
additionalName | Text | An additional name for a Person, can be used for a middle name. |
address | PostalAddress | Physical address of the item. |
The key is to link source data in the Enterprise via a Business Vocabulary from MDM to the Source Data Metadata Vocabulary from a Metadata Mart to conform the triples collected internally and externally.
In essence information from the web applications can be integrated with the dimensional metadata mart, MDM Model and existing Data Warehouses providing lineage for selected raw data from web to Enterprise conformed Dimensions that have gone thru Data Quality processes.
Please let me know your thoughts.