
Playbook
Tasteful Skills
“Tasteful Skills” argues that the best agent skills are not documentation or best-practice lists.
Ara Khan’s core warning is simple: don’t build “slop” just because agents can generate a lot of code fast — he argues the architecture and state machine still need human thought, even if AI does most of the typing.
He frames agent building as four maturity levels: frameworks, custom state machines, Kanban UX, and cloud deployment — the idea is to start with something like LangChain or LangGraph for a 30-minute proof of value, then move up only when the problem deserves it.
His most important implementation rule is to treat every agent as a state machine, not magic — whether it’s Cursor, Codex, or Cline, he says it all reduces to a recursive while loop with conditions, transitions, and end states you should be able to visualize.
More prompt engineering and more logic often make frontier models worse, not better — Khan cites the Codex repo, where the GPT-5 prompt is about one-third the size of the GPT-5.3 prompt, as evidence that newer models often need less instruction, not more.
He thinks the best UX for managing multiple agents is a Kanban board because humans are now inference-bound — if one agent is working for 8 to 10 minutes, the practical move is to run two or three in parallel and manage them like an engineering manager overseeing ICs.
His endgame is cloud agents running long, parallel tasks from anywhere — he describes sending 15- to 20-minute jobs from his phone, including UI testing flows like signing in, changing VS Code settings, and retrying failures until the agent can open a PR.
Ara Khan opens by naming the weird vibe in the room: everyone feels surrounded by magical robots and has no idea whether they should unleash 15 agents at once or review every line by hand. His pitch is basically, “guys, let’s slow down,” and replace the conference-hall panic with a practical ladder for building useful agents.
He jokes that Factory, Codex, and Cursor UIs are now so similar that nobody can reliably tell which is which — including him. That sameness is his setup for the real point: stop chasing surface aesthetics and think about what maturity level your agent work is actually at.
For the first stage, Khan is surprisingly pragmatic: if you’re testing a basic workflow like aggregating emails or automating some rudimentary task, use a framework like LangChain or LangGraph and get something working in half an hour. His caveat is blunt, though — once you want serious production behavior, frameworks usually stop giving you the modularity and customizability you need.
His biggest implementation heuristic is to think of every agent as a state machine: not magic, just a recursive while loop with conditions and end states. He walks through a simple example — user asks to read files, the agent enters a read-files state, uses a tool, realizes it has enough info, then calls completion — and says that if you can hold that model in your head, agent design gets dramatically easier.
Khan’s second rule is almost anti-hype: every extra thing you add risks degrading performance, whether that’s giant system prompts, edge-case logic, or fancy if/else trees. He says frontier models often do better when you “get out of the way,” points to shorter prompts in newer Codex setups, and notes Cline has apparently been rewritten from scratch around seven times to strip accumulated junk.
The third rule gets delightfully meta: make your agent easy to build and test via CLI so coding agents can improve it too. He describes a “pseudo RL pipeline” where AI isn’t just being guided by humans anymore — humans are increasingly shaping systems so AI can navigate them, run tests, make changes, and verify everything end to end in parallel threads.
Rule four is “don’t be a slob”; rule five is that frontier labs want lock-in. His concrete example is reasoning traces in newer models like Opus 4.6 and Gemini 1.5 Pro / 5.3: if you don’t send those traces back in exactly the expected format, things still appear to work, but performance quietly degrades and you may never notice.
Khan’s strongest product take is that Kanban is the right UI for agents because humans are now inference-bound: while one agent runs for 8 to 10 minutes, you should be supervising others in parallel. Pair that with cloud agents, and you get long-running jobs that can do QA clicks, terminal tests, and repeated retries for 15 to 60 minutes — sometimes launched from his phone — until all you have to do is come back and pull the PR.
Share
Keep Reading
The Weekly Echo. The inbox-shaped summary of what mattered.
New editorials announced here.

Playbook
“Tasteful Skills” argues that the best agent skills are not documentation or best-practice lists.

Playbook
Learn how tasteful prompting helps you move beyond generic AI output by shaping context, style, and judgment from the start.

Playbook
OpenAI shipped /goal for the Codex CLI. It turns a prompt into a persisted, self-continuing contract.