[FULL WORKSHOP] AI Coding For Real Engineers - Matt Pocock, AI Hero (@mattpocockuk )
TL;DR
Matt Pocock’s core thesis is that classic software engineering fundamentals work better with AI than “specs-to-code” hype — he argues that keeping tasks small, preserving code awareness, and using feedback loops beats throwing giant PRDs at Claude and hoping for magic.
LLMs have a “smart zone” and a “dumb zone,” and Pocock’s practical marker is roughly 100k tokens — even with 1M-token context windows, he says models still get progressively worse as context grows, so developers should optimize for short, resettable sessions rather than endless chats and compaction.
His favorite planning move is a “grill me” skill that can ask 40 to 100+ questions until human and model share the same design concept — instead of having AI eagerly spit out a plan, he uses relentless Q&A to surface nasty edge cases like retroactive point backfills and streak rules before any implementation starts.
He replaces linear multi-phase plans with Kanban-style, vertically sliced issues that agents can grab independently — for the gamification demo in his Cadence course platform, he pushes AI toward end-to-end slices like “award points for lesson completion visible on dashboard,” not horizontal layers like “just build the schema first.”
Implementation should be AFK, but planning and QA must stay human-in-the-loop — Pocock treats alignment as the “day shift” for humans and coding as the “night shift” for agents, then insists on manual QA and code review to inject taste and avoid shipping slop.
Bad codebases make bad agents, so he wants deep modules, strong tests, and TDD-heavy feedback loops — he cites John Ousterhout’s deep-vs-shallow modules idea, says feedback quality is the ceiling on AI coding quality, and shows agents writing tests first with red-green-refactor as a way to reduce cheating and improve output.
The Breakdown
AI Is New, but the Old Rules Still Win
Matt Pocock opens with a pretty grounded claim: yes, AI is a paradigm shift, but the fundamentals of software engineering still matter, maybe more than ever. He polls the room — most people code with AI every day, and basically everyone has been frustrated by it — which sets up his whole workshop as a response to that frustration.
The “Smart Zone,” the “Dumb Zone,” and Why Long Contexts Still Betray You
Borrowing a concept from Dex Horthy of HumanLayer, Pocock says LLMs do their best work early in a session and get noticeably worse as context grows. His rule of thumb is around 100k tokens, regardless of whether the model advertises 200k or 1M context, and he uses a football league analogy: every extra token increases attention relationships quadratically, so complexity quietly explodes.
Why He Hates Compaction and Wants LLMs to Be Like Memento
Pocock says every LLM session has the same shape: system prompt, exploration, implementation, testing. He strongly prefers clearing context back to a tiny, stable starting state over compacting conversation history, because compacting creates “sediment,” while resetting gives you the same reliable baseline every time — “like the guy from Memento” forgetting and starting clean.
The “Grill Me” Skill: Alignment Before Planning
His first real workflow move is a tiny prompt skill called “grill me,” which tells the model to interview him relentlessly until they reach a shared understanding. Instead of letting Claude rush into a plan, he makes it ask one question at a time — points economy, streaks, retroactive progress, dashboard placement — and says this is how he builds a shared “design concept,” borrowing the term from Frederick P. Brooks.
From Slack Message to PRD Without Falling for “Specs to Code”
The workshop demo starts from a fake Slack brief by Sarah Chin asking for gamification features in a course platform called Cadence because student retention is weak. Pocock is blunt that pure specs-to-code “sucks”: you cannot ignore the code and just keep editing specs, because “the code is your battleground,” so after grilling he turns the conversation into a PRD that captures the destination, user stories, implementation decisions, and test decisions.
From PRD to Kanban: Why Vertical Slices Beat Layer-by-Layer Work
Rather than asking AI for a numbered multi-phase plan, he converts the PRD into Kanban-style issues with dependencies so multiple agents can eventually work in parallel. His big correction to the model is memorable: AI naturally codes horizontally — schema first, API second, frontend third — but that delays feedback, so he pushes it toward tracer-bullet vertical slices that cut through all layers and produce something visible fast.
The “Night Shift”: AFK Agents, Ralph Loops, and Reviewers in Fresh Context
Once the backlog exists, humans step out and agents take over implementation. Pocock demos a bash-scripted “Ralph” loop and later a TypeScript tool called Sandcastle that creates worktrees, runs implementers, reviews commits in fresh context, and merges branches; notably, he uses Sonnet for implementation and Opus for review because he wants more intelligence on the reviewer side.
TDD, Deep Modules, and the Human Taste Test at the End
He says TDD is one of the highest-leverage tricks for AI coding because red-green-refactor forces the agent to anchor work in testable behavior instead of writing code first and cheating later. He closes with John Ousterhout’s “deep modules” idea — simple interfaces, rich internals — and argues that better module boundaries and stronger feedback loops make agents dramatically better, but manual QA is still where humans reinsert judgment, taste, and quality control so the output doesn’t turn into slop.