What happens when a data consultancy actually tests what it's selling.

There's a pre-flight safety briefing happening in the data and AI industry right now, and most consultancies are delivering it from the wrong seat.

Every firm I know is repositioning as an AI consultancy. New branding. New service lines. Thought leadership about transformation, efficiency, unlocking value. You've seen the decks. The language is indistinguishable from firm to firm — and that's the tell.

Here's what nobody's saying out loud: consulting is almost entirely made up of exactly the work AI is best at. Analysis. Synthesis. Pattern recognition. Document production. Code generation. Structured reasoning from complex inputs. If any industry should be demonstrating dramatic efficiency gains right now, it's this one.

So I'll ask the question plainly: how much more efficient are they becoming?

Because I'm not seeing it. I'm seeing new positioning. I'm not seeing consultancies delivering the same outcomes with half the people in half the time.

Which brings me to the pre-flight briefing. In the event of a loss of cabin pressure, put your own oxygen mask on before assisting others. You can't help anyone from a position of not having helped yourself first. Telling clients how to leverage AI while your own delivery model is unchanged isn't thought leadership. It's a pitch.

So I set aside three days to actually find out what was possible. Not a blog post. Not a rebrand. An experiment — deliberate R&D to answer a specific architectural question that's been sitting at the back of my mind.

The question: can you create a system where AI generates production-quality data engineering code consistently, correctly, and within strict governance boundaries — rather than just approximately and hopefully?

Here's what I found out.

What three days actually produced

The starting point was a blank page and a well-formed question. The ending point was a fully specified, metadata-driven Microsoft Fabric architecture that could be handed to an engineer and used on a real engagement tomorrow.

Not a proof of concept. A pattern.

The deliverables at the end of three days:

  • ETL notebook templates — parameterised, standards-compliant, generated from approved design artefacts
  • DDL migrations — immutable, versioned, idempotent, following Fabric-specific constraints
  • A governed Python wheel — shared utilities with a proper release lifecycle
  • DAG orchestration — Fabric pipeline definitions driven by metadata
  • Purview lineage mapping — generated from the same source of truth that drives everything else
  • A code generation pipeline — that reads approved design artefacts and produces code into the correct folder structure, within defined governance boundaries

That last point is the one that matters. This isn't a code assistant that helps you write faster. It's a system where the design artefacts drive generation — and where the AI cannot generate from anything it hasn't been explicitly authorised to touch.

Getting there required understanding something important about how this tooling actually works. Which meant making a mistake first.

The tooling split you have to get right

There are two fundamentally different modes of working with AI on an engagement like this, and conflating them is where things go wrong.

Analysis and design mode — working through dimensional modelling decisions, API contracts, grain definition, business logic, governance rules. Iterative. Conversational. Back-and-forth. This is thinking work, and it needs a thinking environment. I ran this in the Claude app.

Engineering mode — reading approved design artefacts and generating notebooks, migration scripts, and metadata files into the correct folder structure in the repository. This is building work, and it needs a building environment. I ran this in Claude Code, directly in the repo.

Right tool, right job. The Claude app for thinking. Claude Code for building.

What breaks if you blur this? You end up doing design work in an environment optimised for generation, which means your design decisions are less rigorous than they should be. Or you end up trying to generate from a conversation history rather than from authoritative artefacts in version control, which means the AI is working from something approximating your intent rather than from your actual approved decisions.

The handover between these two modes is where the real architectural thinking had to happen.

The document chain: where the thinking actually lives

Two files do most of the heavy lifting. Understanding what's in them is understanding the pattern.

CLAUDE.md — the repo constitution

This file lives in the repository root. It's the first thing Claude Code reads before touching anything — and it's explicit about that: read it in full before taking any action; when instructions here conflict with anything else, this file wins.

It defines the hard rules the AI operates within. Some examples:

  • No hardcoded connection strings, workspace IDs, or environment values — ever
  • No string interpolation in SQL queries — always parameterised placeholders
  • Migration scripts are immutable — a new file for every schema change, never modify an existing one
  • Core is frozen between releases — spot a bug in core, don't fix it inline, raise it as a versioned change request

It also documents platform-specific constraints that would otherwise cause generation errors. For instance, there's a known Fabric SQL Database DDL issue: BIGINT IDENTITY columns can't use an inline PRIMARY KEY in a CREATE TABLE statement. The workaround is an ALTER TABLE after creation, wrapped in an idempotency guard. That's the kind of constraint that creates a debugging death spiral if it's missing from the prompt context.

This file is the difference between an AI that understands your standards and one that generates something that looks right until it quietly isn't.

generate/system-prompt.md — the engineering mode contract

This file governs exactly what Claude Code does when generating ETL notebooks, DDL migrations, and metadata scripts. Before writing a single file, it must run a six-point pre-generation checklist and state its answers explicitly:

  1. Are all source-to-target mapping rows for this target table marked status = approved? If any are draft or reviewed, stop, name them, and ask.
  2. Does every target table appear in the Logical Data Model? If not, stop.
  3. Do any approved transformation rules still contain an unresolved warning marker? If yes, stop and list them.
  4. Is grain defined for every target table? If not, stop.
  5. Are business description, data owner, and sensitivity classification populated on every approved row?
  6. State scope explicitly: "Generating from N approved rows across M target tables: [list them]."

Only after all six checks pass does generation begin.

This is the governance layer made operational. The AI doesn't bypass it on a good day and ignore it on a bad one. It's instruction, not trust. And if asked to override it, it refuses and explains why.

The Bus Matrix has to exist before any source-to-target mapping. Grain has to be confirmed before any fact table is touched. A human-gated status lifecycle — draft, reviewed, approved — controls what the AI can act on. The AI can read status. It cannot modify it.

Claude Code reads both files directly from the repository. No copy-paste. No manual context-feeding. The approved artefacts are just there. The documents tell it what to do with them.

Document pipeline. Then code pipeline. In that order.

The honest take: beautiful code, and some wrong decisions

There's a part of this story that's easy to omit, and I'm not going to omit it.

The code is genuinely good. Well-documented, consistent, standards-compliant throughout. The kind of output a good senior engineer produces when they're not rushing — except it took minutes, not weeks. That's not marketing language. That's what I observed.

The decisions? That's where it gets interesting.

Recommended Python libraries with missing capabilities. Authentication approaches that looked correct until they didn't — and then you're in a debugging cycle that goes: try this, try that, try this again. If you don't already know how this stuff is supposed to work, you won't recognise the wrong turn until you're deep into it.

That observation is important enough to say plainly: this tool amplifies expertise, it does not replace it.

If you know how to build data platforms — Kimball-level dimensional modelling, Fabric-specific constraints, production-grade governance, the difference between a pattern that scales and one that looks like it scales — then this adds booster rockets to what you already do.

If you don't, you will generate confident, well-documented, beautifully formatted code that quietly leads you somewhere you don't want to be. The code won't look wrong. It'll look excellent. That's precisely what makes it dangerous in the wrong hands.

Know the craft. Then use the tool. In that order.

What this means for the architect role

Some harder thoughts on what this kind of pattern actually means for the profession.

Does it destroy engineering jobs? Possibly, in certain forms. But that's not the interesting question.

The interesting question is what it does to engineers who understand it. And the answer is: it makes them dramatically more effective. Work that took a team months now takes one person with the right skills a few days. That's not a productivity improvement. That's a different kind of role.

The engineers who should be paying close attention are those whose value currently lives in translation work — the mechanical conversion of agreed design into consistent, standards-compliant code. That's the part that gets automated first and most completely.

The architect skillset, it turns out, is exactly right for this moment.

For years, architecture meant producing elegant diagrams that someone else had to translate into working systems. There was always a gap — sometimes a chasm — between the design and what actually got built. That gap is closing fast.

With the right pattern behind you, an architect with genuine technical depth can take a requirement from stakeholder transcript to production-ready code without the translation layer in between. The design is the build. The artefacts drive the generation. The architect sees it through.

That changes what the role means. And it changes which skills matter.

Requirements capture — properly, not approximately. You can't govern what you haven't understood.

Data modelling — Kimball-level rigour. The AI generates from your design. A bad design generates bad code, beautifully.

Solution architecture — the pattern, the boundaries, the governance layer. That structure has to come from somewhere, and it has to come from a human who understands why it's structured that way.

Technical depth and breadth — enough to recognise the wrong turn before you're deep into it. This is the non-negotiable.

A working knowledge of AI tooling — not evangelism, not fear. Just fluency with what these tools actually do and don't do.

The architect role is becoming less about abstract design and more about actual delivery. For the right people, that's not a threat. It's what they always wanted the job to be.

The architects who develop that combination aren't going to be replaced by this technology. They're going to be doing things that weren't previously possible.

The question this leaves open

We are in the middle of something significant. The consultancies and practitioners who are figuring this out in their actual delivery — not in their proposals, not in their rebrands, but in how work actually gets done — are going to have a structurally different cost model and a fundamentally different conversation with clients.

Three days. Blank page to fully specified architecture.

That's us putting on our own mask first.

If you're building a data platform and want to understand what this pattern looks like in practice — or if you're leading a consultancy that wants to figure this out properly — I'd be interested in that conversation.

Get in touch →


Stephen Armory is the founder of Data Partners, a Microsoft Fabric consultancy for UK SMEs. Former Microsoft Senior Cloud Solutions Architect. Azure Solutions Architect Expert, Fabric Analytics Engineer, Fabric Data Engineer certified.