Back to Podcast Digest
AI Engineer18m

Don't Build Slop (4 Levels of AI Agent Maturity) - Ara Khan, Cline

TL;DR

  • Ara Khan’s core warning is simple: don’t build “slop” just because agents can generate a lot of code fast — he argues the architecture and state machine still need human thought, even if AI does most of the typing.

  • He frames agent building as four maturity levels: frameworks, custom state machines, Kanban UX, and cloud deployment — the idea is to start with something like LangChain or LangGraph for a 30-minute proof of value, then move up only when the problem deserves it.

  • His most important implementation rule is to treat every agent as a state machine, not magic — whether it’s Cursor, Codex, or Cline, he says it all reduces to a recursive while loop with conditions, transitions, and end states you should be able to visualize.

  • More prompt engineering and more logic often make frontier models worse, not better — Khan cites the Codex repo, where the GPT-5 prompt is about one-third the size of the GPT-5.3 prompt, as evidence that newer models often need less instruction, not more.

  • He thinks the best UX for managing multiple agents is a Kanban board because humans are now inference-bound — if one agent is working for 8 to 10 minutes, the practical move is to run two or three in parallel and manage them like an engineering manager overseeing ICs.

  • His endgame is cloud agents running long, parallel tasks from anywhere — he describes sending 15- to 20-minute jobs from his phone, including UI testing flows like signing in, changing VS Code settings, and retrying failures until the agent can open a PR.

The Breakdown

A Reality Check for People Panicking About Agents

Ara Khan opens by naming the weird vibe in the room: everyone feels surrounded by magical robots and has no idea whether they should unleash 15 agents at once or review every line by hand. His pitch is basically, “guys, let’s slow down,” and replace the conference-hall panic with a practical ladder for building useful agents.

Everything Looks the Same, So Pick a Real Strategy

He jokes that Factory, Codex, and Cursor UIs are now so similar that nobody can reliably tell which is which — including him. That sameness is his setup for the real point: stop chasing surface aesthetics and think about what maturity level your agent work is actually at.

Level 1: Frameworks Are Fine for Finding Out if the Idea Even Works

For the first stage, Khan is surprisingly pragmatic: if you’re testing a basic workflow like aggregating emails or automating some rudimentary task, use a framework like LangChain or LangGraph and get something working in half an hour. His caveat is blunt, though — once you want serious production behavior, frameworks usually stop giving you the modularity and customizability you need.

Level 2: Real Agents Are Just State Machines With Sharp Edges

His biggest implementation heuristic is to think of every agent as a state machine: not magic, just a recursive while loop with conditions and end states. He walks through a simple example — user asks to read files, the agent enters a read-files state, uses a tool, realizes it has enough info, then calls completion — and says that if you can hold that model in your head, agent design gets dramatically easier.

Five Rules for Not Making Your Agent Worse

Khan’s second rule is almost anti-hype: every extra thing you add risks degrading performance, whether that’s giant system prompts, edge-case logic, or fancy if/else trees. He says frontier models often do better when you “get out of the way,” points to shorter prompts in newer Codex setups, and notes Cline has apparently been rewritten from scratch around seven times to strip accumulated junk.

Build So Other Agents Can Work on Your Agent

The third rule gets delightfully meta: make your agent easy to build and test via CLI so coding agents can improve it too. He describes a “pseudo RL pipeline” where AI isn’t just being guided by humans anymore — humans are increasingly shaping systems so AI can navigate them, run tests, make changes, and verify everything end to end in parallel threads.

Don’t Get Locked In by Frontier API Weirdness

Rule four is “don’t be a slob”; rule five is that frontier labs want lock-in. His concrete example is reasoning traces in newer models like Opus 4.6 and Gemini 1.5 Pro / 5.3: if you don’t send those traces back in exactly the expected format, things still appear to work, but performance quietly degrades and you may never notice.

Kanban and Cloud Agents as the Final Form

Khan’s strongest product take is that Kanban is the right UI for agents because humans are now inference-bound: while one agent runs for 8 to 10 minutes, you should be supervising others in parallel. Pair that with cloud agents, and you get long-running jobs that can do QA clicks, terminal tests, and repeated retries for 15 to 60 minutes — sometimes launched from his phone — until all you have to do is come back and pull the PR.

Share