Category Archives: data profiling

Why AI Governance is Actually Data Governance in a Helmet: 5 Surprising Truths About the New Data Era

History is an evolutionary arc of innovation, and every leap—from the wheel to the internet—has been met with a cocktail of excitement and existential dread. When the wheel was invented, humans didn’t stop walking; they simply stopped walking everywhere, enabling a scale of trade previously thought impossible. Today, the conversation surrounding Artificial Intelligence follows a similar pattern, oscillating between the marvel of autonomous agents and the fear of widespread job replacement.

However, beneath the hype, a more immediate technical crisis is unfolding. Most AI projects fail not because of model limitations, but because of a “silent saboteur” known as data chaos. Gartner estimates that through 2026, 60% of AI projects lacking AI-ready data will be abandoned. To survive this shift, we must recognize that “AI Governance” isn’t a futuristic new discipline. It is foundational Data Governance wearing a helmet—a protective layer of adversarial robustness and ethical guardrails designed for a world where machines consume data at scale.

1. The Architectural Formula: AI Governance = Data Governance

For the modern Data Architect, the realization is stark: you cannot govern an AI agent without first governing the data feeding it. We often hear about agent safety and model alignment as if they were entirely new concepts. In reality, the most dangerous AI failures—hallucinations, PII leaks, and unpredictability—originate in the data pipelines, access controls, and lineage that engineers have managed for years.

Many of the “new” requirements for agentic systems are simply existing data engineering principles rebranded. Promoting an agent safely across environments is essentially version control and production approval; managing agent risk is a new interface for schema validation and drift detection. For those of us building RAG (Retrieval-Augmented Generation) pipelines, our existing skills in RBAC (Role-Based Access Control) and provenance are more relevant than ever.

“AI governance is not something you start after your data platform is built—it is something that emerges from the maturity of your data platform. The formula is simple: AI Governance = Data Governance.” — Egezon Baruti

2. AI Isn’t Coming for Your Job—It’s Coming for Your “Data Chaos”

The primary barrier to AI success isn’t a lack of compute; it is the systemic dysfunction born from fragmentation and inconsistency. We are currently living through a staggering imbalance in the data economy: 90% of the world’s data was generated in just the last two years, yet only 3% of the enterprise workforce are data stewards. This gap creates a bottleneck where data turns from an asset into a liability.

Several forces drive this chaos in the modern enterprise:

  • Source Proliferation: Data streaming from IoT, APIs, and legacy databases with conflicting semantics.
  • Operational Complexity: Integration debt accumulated as digital ecosystems expand.
  • Uncontrolled Growth: Millions of new data objects generated daily, outstripping human capacity to govern them manually.

The shift currently underway moves the professional from an Executor—buried in manual curation and quality firefighting—to an Orchestrator. In this new era, we oversee AI agents that handle the mechanical toil of documentation and anomaly detection, allowing us to focus on strategic “semantic trust.”

3. Prompt Engineering is the New Data Validation Layer

We are witnessing a transition from rule-based validation (rigid SQL checks and regex) to reasoning-based validation. Traditional systems can check if a field is a string, but they struggle with logic. An LLM-powered validator, however, can recognize that a birth year of “2025” for a current executive is a logical impossibility, even if the syntax is perfect.

This shift transforms the Prompt Engineer into a “Data Auditor” who evaluates semantic coherence rather than just syntax. By treating validation as a reasoning problem, organizations have seen an 87% reduction in false positives compared to traditional systems. In high-paying technical roles, prompts are no longer just “chats”; they are treated as structured code that must be version-controlled, tested for model drift, and scaled across the enterprise.

“Prompt engineering changes the game by treating validation as a reasoning problem… It is a shift from enforcing constraints to evaluating coherence.” — Dextra Labs

4. The “0.5% Reality” and the Value of the Horseback Rider

While “Prompt Engineer” is a buzzworthy title, ArXiv research reveals that dedicated roles with this exact name represent less than 0.5% of job postings. However, the skill profile for these roles is distinct and highly valuable. Success in the 21st-century data landscape requires a hybrid profile: AI knowledge (22.8%), communication (21.9%), and creative problem-solving (15.8%).

In this environment, Subject Matter Expertise (SME) is becoming more valuable than the ability to write boilerplate code. Consider a unique example: a professional with deep expertise in horseback ridingcan craft prompts that generate content exactly tailored to that niche’s nuances, whereas a generalist programmer cannot.

The market reflects this value. In 2026, Glassdoor reports the average salary for these roles is 128,000∗∗,withseniorrolescommandingupto∗∗224,000in sectors like Media and Communication.

  • Information Technology: $117,000 – $168,000
  • Management & Consulting: $103,000 – $169,000
  • Media & Communication: $140,000 – $224,000

5. Security Beyond Encryption: The Era of Ethical Guardrails

Modern security is no longer just about who can see the data; it is about adversarial robustness. As we integrate frameworks like DAMA-DMBOK with the NIST AI Risk Management Framework (RMF), we move toward a “Map, Measure, and Manage” approach.

The “helmet” of AI governance requires a new checklist of technical guardrails:

  • Bias Detection: Swapping demographic attributes (gender, age) in input data to ensure the model’s tone or recommendation remains neutral.
  • PII Detection: Ensuring RAG pipelines don’t inadvertently surface Social Security numbers or private addresses.
  • Proactive Jailbreaking: Attempting to bypass your own safety rules using urgent tones or “peer pressure” tactics to identify weaknesses in system prompts.

In a production environment, “Explainable AI” is the ultimate form of trust. Transparency—the ability to trace a model’s decision back to its training data lineage—is now the primary form of security.

Conclusion: From Rules to Reasoning

The leap from rule-based compliance to intelligent reasoning is the fundamental change of our era. The most successful tech strategists won’t be those who build the most complex code, but those who “teach the AI how to think responsibly.”

The frontier of data quality isn’t defined by stricter rules, but by asking better questions. As you look at your own technical roadmap, ask yourself: are you building your AI strategy on a foundation of trust, or a foundation of chaos? The answer lies not in your models, but in the maturity of your data governance.

The $350k Transition: 5 Surprising Realities of Becoming an AI Engineer

The software development landscape is undergoing its most dramatic transformation since the shift from assembly to high-level languages. By 2026, projections suggest that 90% of all code will be AI-generated. This reality has sparked a wave of anxiety, but the data tells a more nuanced story of bifurcation rather than obsolescence.

While entry-level tech hiring decreased by 25% year-over-year in 2024 and employment for developers aged 22–25 declined nearly 20%, the demand for senior talent capable of managing AI systems has reached a fever pitch. We are witnessing the death of the “Syntax Memorizer”—the 2022-style developer whose primary value was handwriting functional lines. In their place emerges the System Orchestrator: an engineer who leverages AI to deliver the output once expected from a team of ten.

Underneath the hype, a new layer of engineering work has emerged. This isn’t research or model training; it is product engineering where AI is a system component. If you are a full-stack architect looking to future-proof your career, the transition to becoming an AI engineer requires a deliberate evolution of your technical stack and mindset.

1. Prompting is Now “Table Stakes” (Master Context Engineering)

Many developers remain fixated on the surface layer: perfecting prompts or chasing the latest “hacks.” While prompt engineering was the buzzy role of 2023, it has rapidly become a standard capability, much like using an IDE or keyboard shortcuts.

The professional differentiator is no longer just the prompt; it is Context Engineering. This is the rigorous discipline of managing the non-prompt elements supplied to a model—metadata, API tool definitions, and token budgeting—to ensure reliability and provenance. Your value is shifting from a “Code Writer” to an architect of the environment in which the AI operates.

As Andrew Ng points out, you cannot simply “vibe code” your way to production-grade systems:

“Without understanding how computers work, you can’t just ‘vibe code’ your way to greatness. Fundamentals are still important, and for those who additionally understand AI, job opportunities are numerous!”

2. RAG is the Single Most Critical Skill (The Undervalued Infrastructure)

If you commit to one technical skill this year, make it Retrieval-Augmented Generation (RAG). While social media is captivated by flashy autonomous agents, RAG is the “undervalued infrastructure layer” that startups and enterprises are actually paying for.

RAG is the process of providing a Large Language Model (LLM) with proprietary data at the right time to prevent hallucinations. In practice, this involves:

  • Converting documents into embeddings(numerical vectors).
  • Managing vector databases like Pinecone or Qdrant for high-dimensional storage.
  • Designing semantic retrieval systems that allow models to interact with live, changing data.

This is the foundation of useful AI products. For example, when a DoorDash driver asks how to handle spilled pickle juice, a RAG system retrieves the specific internal protocol for vehicle maintenance to provide an accurate, human-readable answer. Similarly, Spotify uses these patterns to find songs with semantically similar lyrics. Mastering the “boring” plumbing of data flow is what separates a hobbyist from a $350k IC.

3. Workflows Over Agents (The “Deterministic” Advantage)

The term “AI Agent” is dangerously overloaded. In a hype-driven market, non-technical CEOs often demand “autonomous agents” that run until a task is done. In reality, these uncontrolled agentic loopsoften lead to exploding token costs and non-deterministic failures.

The superior architectural pattern is the controlled workflow. As an engineer, your job is to create deterministic outcomes in a non-deterministic world. This requires:

  • Human-in-the-loop patterns: Designing checkpoints for critical decisions.
  • Orchestration: Utilizing patterns like “ReAct” or “Orchestrator” to classify and route tasks programmatically.
  • FinOps Mindset: Implementing observability tools like Helicone or LangSmith to monitor token consumption and latency.

Having a technical opinion on workflows vs. agents is a superpower. Most companies are operating on “social media vibes”; the AI engineer provides the strategic direction and cost control necessary for enterprise scale.

4. The Return of the “CS Fundamentalist”

There is a persistent myth that AI makes Computer Science degrees obsolete. The reality is that as the cost of generating code drops to zero, the cost of the friction created by bad code—security flaws, technical debt, and architectural rot—skyrockets.

Andrew Ng notes that while 30% of traditional CS knowledge (like memorizing syntax) is fading, the remaining 70% is more vital than ever. You cannot verify or supervise AI-generated code if you do not understand the Critical Fundamentals:

  • Concurrency and Parallelism: Essential for managing asynchronous AI API calls and system throughput.
  • Memory and Performance Complexity: Vital for optimizing token usage and high-dimensional vector searches.
  • Networking Basics: Crucial for managing the distributed nature of modern AI services.

Deep technical knowledge is what builds the “design taste” required to know when to introduce an architectural principle and when to push back against a model’s suggestion.

5. Testing isn’t Dead—It Just Got a “Black Box” Problem

Traditional unit testing is insufficient for non-deterministic AI services. Because LLMs are “black boxes,” they require a new testing paradigm focused on Evals (evaluation sets).

Instead of testing for a specific string output, professional AI engineers utilize the LLM-as-a-judgepattern. By creating a “Gold Set” of ideal responses, you can use one LLM to score another’s output on a scale of 1 to 10. This allows you to:

  • Detect model drift or prompt regressions before they reach the user.
  • Safely upgrade or downgrade models (e.g., GPT-4o to a smaller, faster model) without breaking functionality.
  • Ensure that a minor prompt change by a teammate hasn’t compromised system logic.

Flying blind with non-deterministic services is a recipe for losing customer trust. A rigorous testing mindset is now the primary differentiator between an “AI Bro” and a professional engineer.

Conclusion: Crossing the 3-Month Gap

The transition from a standard full-stack developer to a high-earning AI Engineer is a marathon, but the initial competency gap can be bridged in roughly one to three months by following a structured roadmap:

  • Phase 1: Integrate & Accelerate (Month 1): Adopt AI pair programmers (Cursor, Copilot) and agentic review tools. Focus on moving from simple comments to structured context engineering.
  • Phase 2: Architect & Orchestrate (Months 2-3):Build a RAG-based application. Store proprietary data in a vector database and implement a controlled workflow using a framework like LangGraph or a manual “human-in-the-loop” pattern.
  • Phase 3: Strategize & Lead (Ongoing): Develop a quality framework using Evals and LLM-as-a-judge. Quantify your impact on team velocity and begin managing the technical debt that AI code inevitably generates.

In tech-forward hubs like San Francisco, senior individual contributors who master this orchestration are commanding salaries between $200,000 and $350,000.

The question is no longer whether AI will change your job, but how you will respond to the shift. Do you want to be the developer struggling to compete with AI-generated syntax, or the orchestrator designing the systems that command it?

From Mainframe to Mindset: The Surprising Leap from COBOL to AI Intelligence

For decades, the enterprise has been haunted by the ghost of “legacy.” We’ve been told that the core logic of our businesses—the trillions of rows of data locked in 60-year-old COBOL files—is a liability, a frozen asset too fragile to touch and too complex to modernize. But as a digital transformation strategist, I see a different reality. This isn’t technical debt; it is the untapped IQ of your organization.

The “Legacy Logic” framework is shattering the traditional modernization roadmap. By leveraging Metadata Garage Services, the bridge between the mainframe and the frontier of AI has become remarkably short. We are no longer talking about a multi-year migration nightmare; we are talking about a fundamental shift in mindset that turns a “static garage” of records into a high-velocity AI Intelligence Hub.

The Zero-Refactor Revolution

The single greatest barrier to innovation is the “Prep-Work Myth.” Conventional wisdom dictates that before AI can even glance at legacy data, you must endure years of refactoring, manual coding, and grueling data normalization. For most CIOs, touching the legacy core is a high-stakes risk that threatens the very stability of production environments.

Metadata Garage Services provides the ultimate “read-only” path to intelligence, effectively breaking the shackles of technical debt without jeopardizing the system of record. The mandate is clear: you can now move toward “AI from your COBOL files with no coding, requirements, or preparation.”

By removing the need for manual intervention or system overhauls, we shift the culture of the IT department from “maintenance and defense” to “innovation and insight.” You don’t need to rewrite your history to benefit from the future; you simply need the right interface to access it.

The Automated On-Ramp: From Blind Storage to Statistical Clarity

Every failed digital transformation starts with messy data. In the legacy world, COBOL files are often “black boxes”—raw records that offer zero visibility to modern tools. To an LLM (Large Language Model), an unmapped mainframe file is just noise.

This is where the “Legacy Logic” tools provide an essential on-ramp. By processing COBOL data files and gathering automated statistics, these tools create a comprehensive “context map” of your historical data. We are moving from blind storage to instant visibility, transforming raw records into a viable, structured starting point for intelligence. This statistical baseline is the “ground truth” that allows an AI to navigate decades of enterprise memory with precision. It turns what was once “dark data” into a clear, searchable asset before a single prompt is even written.

Conversational IQ: Turning Records into an Intelligence Hub

The true “Mindset” shift occurs when we stop viewing data as a report and start viewing it as a conversation. Through the integration of processed records into NotebookLM, we are creating a sophisticated AI Intelligence Hub that fundamentally changes how stakeholders interact with the past.

Imagine the power of moving away from a COBOL programmer writing a batch report that takes three days to execute. Instead, a CEO or Product Manager can ask a natural language question: “Compare our highest-performing insurance riders from 1985 against current market trends—what logic are we missing?”

By loading legacy records into a conversational notebook environment, the data is no longer a static archive; it is a live participant in strategic decision-making. This workflow turns the “Legacy Garage” into a fountain of insights, allowing the enterprise to “talk” to its history through a 21st-century interface.

The Future of the Mainframe

The transition from COBOL to AI is not about replacement; it is about liberation. Metadata Garage Services proves that the mainframe can remain a foundational asset while its data is freed to fuel modern competitive advantages. By automating the extraction and statistical mapping of legacy files, we bridge the gap between the mid-20th-century engine and the AI-driven future.

The technical hurdles have been cleared. The only remaining question is one of vision: What transformative insights are currently hidden in your own legacy “garage,” just waiting to be uncovered?

Chocolate cake, MDM, data quality, machine learning and creating the information value chain’

The primary take away from this article will be that you don’t start your Machine Learning project, MDM , Data Quality or Analytical project with “data” analysis, you start with the end in mind, the business objective in mind. We don’t need to analyze data to know what it is, it’s like oil or water or sand or flour.

Unless we have a business purpose to use these things, we don’t need to analyze them to know what they are. Then because they are only ingredients to whatever we’re trying to make. And what makes them important is to what degree they are part of the recipe , how they are associated

Business Objective: Make Desert

Business Questions: The consensus is Chocolate Cake , how do we make it?

Business Metrics: Baked Chocolate Cake

Metric Decomposition: What are the ingredients and portions?

2/3 cup butter, softened

1-2/3 cups sugar

3 large eggs

2 cups all-purpose flour

2/3 cup baking cocoa

1-1/4 teaspoons baking soda

1 teaspoon salt

1-1/3 cups milk

Confectioners’ sugar or favorite frosting

So here is the point you don’t start to figure out what you’re going to have for dessert by analyzing the quality of the ingredients. It’s not important until you put them in the context of what you’re making and how they relate in essence, or how the ingredients are linked or they are chained together.

In relation to my example of desert and a chocolate cake, an example could be, that you only have one cup of sugar, the eggs could’ve set out on the counter all day, the flour could be coconut flour , etc. etc. you make your judgment on whether not to make the cake on the basis of analyzing all the ingredients in the context of what you want to, which is a chocolate cake made with possibly warm eggs, cocunut flour and only one cup of sugar.

Again belaboring this you don’t start you project by looking at a single entity column or piece of data, until you know what you’re going to use it for in the context of meeting your business objectives.

Applying this to the area of machine learning, data quality and/or MDM lets take an example as follows:

Business Objective: Determine Operating Income

Business Questions: How much do we make, what does it cost us.

Business. Metrics: Operating income = gross income – operating expenses – depreciation – amortization.

Metric Decomposition: What do I need to determine a Operating income?

Gross Income = Sales Amount from Sales Table, Product, Address

Operating Expense = Cost from Expense Table, Department, Vendor

Etc…

Dimensions to Analyze for quality.

Product

Address

Department

Vendor

You may think these are the ingredients for our chocolate cake in regards to business and operating income however we’re missing one key component, the portions or relationship, in business, this would mean the association,hierarchy or drill path that the business will follow when asking a question such as why is our operating income low?

For instance the CEO might first ask what area of the country are we making the least amount of money?

After that the CEO may ask well in that part of the country, what product is making the least amount of money and who manages it, what about the parts suppliers?

Product => Address => Department => Vendor

Product => Department => Vendor => Address

Many times these hierarchies, drill downs, associations or relationships are based on various legal transaction of related data elements the company requires either between their customers and or vendors.

The point here is we need to know the relationships , dependencies and associations that are required for each business legal transaction we’re going to have to build in order to link these elements directly to the metrics that are required for determining operating income, and subsequently answering questions about it.

No matter the project, whether we are preparing for developing a machine learning model, building an MDM application or providing an analytical application if we cannot provide these elements and their associations to a metric , we will not have answered the key business questions and will most likely fail.

The need to resolve the relationships is what drives the need for data quality which is really a way of understanding what you need to do to standardize your data. Because the only way to create the relationships is with standards and mappings between entities.

The key is mastering and linking relationships or associations required for answering business questions, it is certainly not just mastering “data” with out context.

We need MASTER DATA RELATIONSHIP MANAGEMENT

not

MASTER DATA MANAGEMENT.

So final thoughts are the key to making the chocolate cake is understanding the relationships and the relative importance of the data/ingredients to each other not the individual quality of each ingredient.

This also affects the workflow, Many inexperienced MDM Data architects do not realize that these associations form the basis for the fact tables in the analytical area. These associations will be the primary path(work flow) the data stewards will follow in performing maintenance , the stewards will be guided based on these associations to maintain the surrounding dimensions/master entities. Unfortunately instead some architects will focus on the technology and not the business. Virtually all MDM tools are model driven APIs and rely on these relationships(hierarchies) to generate work flow and maintenance screen generation. Many inexperienced architects focus on MVP(Minimal Viable Product), or technical short term deliverable and are quickly called to task due to the fact the incurred cost for the business is not lowered as well as the final product(Chocolate Cake) is delayed and will now cost more.

Unless the specifics of questionable quality in a specific entity or table or understood in the context of the greater business question and association it cannot be excluded are included.

An excellent resource for understanding this context can we found by following: John Owens

Final , final thoughts, there is an emphasis on creating the MVP(Minimal Viable Product) in projects today, my take is in the real world you need to deliver the chocolate cake, simply delivering the cake with no frosting will not do,in reality the client wants to “have their cake and eat it too”.

Note:

Operating Income is a synonym for earnings before interest and taxes (EBIT) and is also referred to as “operating profit” or “recurring profit.” Operating income is calculated as: Operating income = gross incomeoperating expenses – depreciation – amortization.