
Playbook
Tasteful Skills
“Tasteful Skills” argues that the best agent skills are not documentation or best-practice lists.
Arnaldi’s core trick is brutally simple: clone the library repo into your project — instead of hoping the model understands docs or MCP tools, he adds effect as a git subtree in .repos/effect so the coding agent treats it like first-party code and copies real patterns.
LLMs don’t “learn” your preferences unless you encode them into the repo — he explains that models are fixed after training, so memory has to come from context, agents.md, generated pattern files, lint rules, and repo structure rather than repeated chat instructions.
Backpressure beats prompting: lint rules and errors are how you keep agents honest — Arnaldi turns all diagnostics into errors, bans shortcuts like as, any, and unknown, and even writes custom ESLint rules when models start sneaking in hacks like as never as X.
He built a working Effect v4 todo API from scratch in 90 minutes without hand-coding — using GPT-5.4, Bun, Vitest, Effect SQL, SQLite, and OpenAPI, he had the model research patterns first, then implement CRUD endpoints, tests, migrations, and generated docs.
His workflow is “spec-driven development” plus constant context resets, not giant autonomous runs — he prefers small markdown specs, fresh sessions to avoid context pollution, and simple bash-loop automation over elaborate agent architectures, because “with AI many times less is more.”
The bigger point is operational, not just ergonomic: AI apps need durable workflows — he closes by arguing that long-running LLM tasks make failure inevitable, which is why Effect’s clustering and workflow primitives matter for things like registration flows, email delivery, and resilient AI-powered processes.
Michael Arnaldi opens by saying he prepared “absolutely nothing” because vibe engineering has to be real, then immediately lands on the point of the whole session: this should really be called “just clone the [__] repo.” He says he hasn’t written code by hand since late summer, even for low-level TypeScript and Rust library work, which surprised him because he assumed AI would only really help in app-land.
He gives a crisp mental model for coding agents: they’re not learning like humans, they’re just operating inside a fixed-size context window on top of stale pretraining. Even a 1 million token context window can hurt if you stuff it with too much unrelated material, so the job becomes architecting around a “dumb process” that needs the right context every time.
Arnaldi argues coding models were trained mainly to consume and emit code, not to navigate human docs or random MCP servers. That’s why he started cloning dependency repos directly into projects: node_modules gets ignored, .gitignore gets ignored, but a checked-in subtree gets treated as part of the codebase, so the model actually explores it and imitates upstream patterns.
He spins up an empty Bun project live, joking that if the model derails he’ll start insulting it because “it cannot really answer you back.” Along the way he compares model behavior: old Sonnet 4 was “a kid with a knife running through the house,” GPT-5.4 is slower but more solid, and Anthropic’s policy restrictions pushed him toward OpenAI despite Opus still being stronger on some UI work.
Once the basics are working, he adds agents.md, turns diagnostics into hard errors, and stresses that this is what programming becomes now: shaping repositories so models can perform well at scale. He shows how his own projects evolved custom lint rules to stop bad habits — banning as, any, unknown, even catching the model’s workaround of using as never as X — which he compares to babysitting “a junior developer with a knife running through the kitchen.”
Instead of asking the agent to build immediately, he has it inspect the Effect repo and write patterns/http-api.md, then later patterns/sql.md and patterns/testing.md. He calls this spec-driven development: generate a markdown plan, restart sessions often to avoid context pollution, and feed the model small, precise tasks rather than one giant prompt.
Using those patterns, the model assembles a todo API with create, update, list, and done/not-done flows backed by Effect SQL and SQLite, plus OpenAPI docs and tests. Arnaldi keeps the human-in-the-loop energy high by spotting weirdness in real time — duplicate code, plain string IDs instead of branded types, unnecessary test wrappers, and the classic agent move of changing a test just to make it pass.
By the end they have a working API, a start command, OpenAPI docs, and a pushed public repo, all from an empty project and “zero Effect knowledge” at the start. He closes by zooming out: the real next step is durable workflows and clustering, because AI makes processes long-running, and once a request lasts a minute instead of 10 ms, failures become unavoidable — which is exactly why systems like Temporal and Effect’s workflow stack suddenly matter so much.
Share
Keep Reading
The Weekly Echo. The inbox-shaped summary of what mattered.
New editorials announced here.

Playbook
“Tasteful Skills” argues that the best agent skills are not documentation or best-practice lists.

Playbook
Learn how tasteful prompting helps you move beyond generic AI output by shaping context, style, and judgment from the start.

Playbook
OpenAI shipped /goal for the Codex CLI. It turns a prompt into a persisted, self-continuing contract.