Meridian

6 min read

Just now

I’ve forgotten how to write code, or at least I think I have. Hard to be sure, I haven’t done it for a while. But then, I start to muse, when did I last rack a brand new server and install Linux? If such a physical process can be reduced to a one-liner with Terraform, why should coding be sacred?

No matter, I can still read code. And docs. And plans. Megatokens of it, and sometimes I even pay attention to what any of it says. Mostly I’m just there to press the big “I’m accountable” button on the screen (at least, that’s what I imagine it says). “I, Lars Janssen, hereby certify that I asked at least two LLM agents to rip the submitted change to shreds in the name of review.”

Press enter or click to view image in full size

It’s easy to ask for reports on pretty much any subject. Actually reading them is another matter.

But occasionally, I have to dig into the structure, pull up the code and make sense of what the different factions of bots have created. I am the final arbiter of good taste, even though every bikeshedding argument online has been distilled into the models so they probably know better than I what good is supposed to look like.

Is that where we’ll be at the end of the decade? A few months ago that future felt comfortably distant. Then something shifted.

Right now, the enthusiasts are hooked. They can’t leave their desks without at least two agents spun up, pondering away, doing their bidding. You’re simply not productive if you’re not burning tokens while you’re on the loo. Meanwhile the sceptics complain — not without good cause — that AI is slowing them down, and they could do it faster themselves.

Both camps are right. Here’s what it actually feels like today:

Your agent produces an impressive diff in ten minutes. You spend an hour making sure it hasn’t missed something that’ll bite you in the arse later.
Context evaporates. 200,000 tokens sounds generous until the agent starts compressing your conversation and forgets what you agreed ten minutes ago.
Output is mind-numbingly verbose. You ask for a focused change and get a dissertation with unsolicited comments and gratuitous refactoring.
The tooling integrations are hit and miss. Some MCPs are brilliant. Others feel like someone scribbled the API docs on the back of an envelope and let the model figure out the rest.

And yet. Despite all of this, something has shifted. Nobody’s arguing about whether it works any more. They’re arguing about how.

From party trick to production

A few years ago, ChatGPT landed and the world briefly took leave of its senses. In a blog post at the time, I called it a “brain in a box” — powerful inference, zero connectivity. Imagine if Apple had released the iPhone but with no network. Impressive tech demo, useless for real work. You could copy-paste snippets in and out, and that was about it.

Last year, the tooling caught up somewhat. Autocomplete gave way to the beginning of agentic workflows. But it was still clunky — connectivity was limited, poor accessibility, and the models would quickly wander off and do their own thing if you took your eyes off them for even a minute.

What changed? Several things, all at once.

The models got properly good — not perfect, but good enough that you could give an agent a real task and get something coherent back, even if you had to tap “yes” fifty times to get through the permission prompts. By Opus 4.5 and GPT-5, people who’d been dismissive started paying attention.

The products matured alongside them. Terminal-native agents that could reach into large, legacy codebases and actually figure out what was going on. Ergonomic enough that you stopped fighting the tool and started working with it.

And we got better at using it. Prompting is a skill. Scoping tasks for agents is a skill. Knowing when to trust the output and when to bin it is a skill.

It wasn’t one breakthrough. It was the compound effect of better models, better tooling, and more experienced users — all arriving at the same time. Like the early internet: nobody remembers the exact day it became useful. It just… did.

When the tooling clicks

The real shift isn’t smarter models. It’s what happens when you plug them into your actual systems.

When I connected Claude Code to our Snowflake data warehouse, a neat tool to help with writing SQL turned into a full-blown analyst. It started trawling through the schemas by itself, cross-referencing against the code and Confluence pages, and came back with insights I hadn’t even thought to look for.

Not “AI writes code for me” but “AI can actually act in the world through well-defined tools.” When the integrations are good, the agent stops being a fancy autocomplete and starts being a genuine collaborator that can investigate, cross-reference, and propose.

When they’re bad, it’s like giving an intern a map where half the streets are made up.

Press enter or click to view image in full size

LLMs are connecting to the world now — no longer just a “brain in a box”

Verification debt

Here’s the thing that’s quietly dawning on everyone: sure, we’re writing less code. But we’re replacing it with verification work.

The agent can produce a plausible-looking diff in minutes. Tests pass. The commit message is better than half the ones humans write. The PR looks clean. And therein lies the trap — because “looks right” is not the same as “is right.”

Call it verification debt: the growing gap between how fast we can generate output and how fast we can validate it. Every time you click approve on a diff you haven’t fully understood, you’re borrowing against the future. And unlike technical debt, which usually announces itself through mounting friction — slow builds, tangled dependencies, the creeping dread every time you touch that one module — verification debt breeds false confidence. The codebase looks clean. The tests are green. And six months later you discover you’ve built exactly what the spec said — and nothing the customer actually wanted.

Instead of asking “how do we produce more code?”, ask “how do we verify more code?” That’s the real question for 2026.

A sensible verification checklist, right now:

Is the agent implementing the right logic, or faithfully coding up a flawed spec? It won’t question your intentions — unless explicitly asked.
What assumptions did the agent make about the domain?
What permissions, data access, or side effects does this change introduce?
Would you stake your name on this doing what the user actually needs — not just what the ticket says?

If the answer to that last one is “probably,” you haven’t finished reviewing.

The human bottleneck

Here’s an uncomfortable truth: if AI makes every engineer even 50% more productive, the org doesn’t get 50% more output. It gets 50% more pull requests, 50% more documentation, 50% more design proposals — and someone has to review all of it.

When a few early adopters generate more PRs, the team absorbs it. When everyone does it, review becomes the constraint. The bottleneck doesn’t disappear. It moves upstream, to the parts of the job that are irreducibly human: deciding what to build, defining “done,” understanding the domain, and making judgment calls about risk and trade-offs.

Nobody wants to review AI slop. There’s a fair expectation that you check your own output before submitting it. But my outbox is piling up with the output of eager agents faster than I can slog through it.

Software engineering has always been knowledge work — analysing, sharing context, building shared understanding. AI can help you find information faster, but you still have to understand it.

Most of my day is me asking the agents questions. “Great question!”, they say, even when I asked it for the umpteenth time, because now that we’re all “10x” developers it’s hard to follow the details of so many projects. AI doesn’t remove the cognitive load. It transforms it.

The worry isn’t only about jobs — it’s that we’ll stop thinking. I’ve heard people say in the office, half-joking I assume, that by the end of the year none of us will even think anymore.

It’s the same fear people had when Google became a thing. Why reason through the documentation when you can search for the answer? Here’s what actually happened: we stopped memorising API signatures and started solving harder problems. The skill shifted, but it didn’t shrink. AI is the same pattern, one level up.

I could be wrong about the specifics. Maybe context windows plateau. Maybe the integrations stay flaky for years. The details don’t matter. The direction is irreversible.

The agents make output cheap. They don’t make responsibility cheap.

I’ll still be at my desk tomorrow, agents spun up, pressing the “blame me” button.

Dashboard

Article

Verification debt: the hidden cost of AI-generated code