
Playbook
Tasteful Skills
“Tasteful Skills” argues that the best agent skills are not documentation or best-practice lists.
Replay logs break down for real agents — Eric Allam argues the classic durable execution model works for workflows like process order, but an agent’s endlessly growing loop of LLM calls and tool calls eventually hits replay limits in size, entries, or versioning complexity.
Agents are sessions, not transactions — his core framing is that workflows have a start and end, while agents persist as long as the user wants, which makes the old stateless backend pattern feel mismatched for multi-hour or multi-day work.
Durable agents need two different persistence layers — Allam splits the problem into an append-only context log for prompts, tool calls, and outputs, plus snapshot/restore for execution state like cloned repos, installed packages, subprocesses, and in-memory data.
Snapshotting compute is what preserves the ‘machine’ side of an agent — instead of reconstructing execution from logs, Trigger.dev snapshots the machine when the user ‘goes to lunch’ and restores it later, making long pauses cheap without losing files, memory, or running processes.
Trigger.dev moved from CRIU to Firecracker microVMs — after shipping CRIU-based snapshots in 2024 and doing millions of restores, they hit limitations around subprocesses, files, and container registry slowness, then switched to full-machine Firecracker snapshots.
Compression made whole-machine snapshots practical — a naive 512 MB VM snapshot was too expensive, but with seekable compression and layering, Trigger.dev got snapshots down to about 14 MB compressed, with snapshot times under a second, restores in a few hundred milliseconds, and roughly 15,000 VM starts per minute.
Eric Allam opens with the familiar agent loop — good enough on your laptop, not good enough for production. The real requirements are heavier: long-running work, durability across turns and code versions, and recovery from failures. He frames the whole talk as a backend infrastructure shift, with a joke about a meme where you can’t tell which one is the human and which one is the agent.
He does a quick history sprint from CGI in 1993 to PHP, LAMP, Rails, Node, and serverless. The common pattern is “request + DB = response,” aka shared-nothing architecture, where meaningful state lives in the database and compute stays disposable. That model dominated because every new request could just redo the work from stored state.
As apps got more complex, side effects turned into multi-step async tasks — send email, charge card, resize image, process order. The failure mode is obvious and painful: if send receipt fails, you really do not want to rerun the whole thing and charge the credit card twice. Durable workflow engines fixed that by caching each step and replaying execution, giving you audit trails and resume points, but also forcing code into a rigid, deterministic shape.
At first, LLMs fit neatly into that world as just another step — classify some text and move on. But once tool calling got good, the orchestration inverted: now the LLM is orchestrating the code. Allam says that if you make an agent loop durable with replay, every LLM call and every tool call becomes another journal entry, and the log just keeps growing turn after turn until the system starts to fall over.
His cleanest idea is splitting the agent into two halves. One is the append-only context log — system messages, user messages, tool calls, tool results, assistant responses — which is extremely valuable and easy to make durable with databases, object storage, or distributed filesystems. The other is execution state: the messy machine stuff like cloned GitHub repos, installed packages, in-memory datasets, subprocesses, and dev servers.
That execution state, he says, cannot realistically be rebuilt from a log. If the user disappears for a while, you can’t afford to keep the machine live, so the answer is snapshot and restore: freeze the machine, save it, bring it back when the next message arrives. That gives you cheap durability across turns, while the context log separately gives you recovery across crashes and code upgrades.
Allam makes the point that this idea is old: IBM mainframes had checkpoint/restore back in 1966 because expensive jobs couldn’t be rerun from scratch. Trigger.dev first used CRIU, a Linux process snapshot system he describes as injecting a “parasite” into a process to dump its memory, and says they shipped it in 2024 and ran millions of snapshot restores. But it only really captured a process, struggled with things like Chrome and FFmpeg, depended on open files, and got slow once container registries entered the picture.
Their answer was moving to Firecracker microVMs, which let them snapshot the entire machine, not just a process. A naive 512 MB snapshot was too costly, so they used seekable compression and layering to shrink snapshots to roughly 14 MB compressed, with snapshots taking a bit under a second and restores only a few hundred milliseconds. He ends by introducing FC Run, a Docker-like CLI for running, snapshotting, restoring, and even forking Firecracker VMs, with benchmarks around 15,000 VM starts per minute — the infrastructure he thinks points toward a stateful-compute future for agents.
Share
Keep Reading
The Weekly Echo. The inbox-shaped summary of what mattered.
New editorials announced here.

Playbook
“Tasteful Skills” argues that the best agent skills are not documentation or best-practice lists.

Playbook
Learn how tasteful prompting helps you move beyond generic AI output by shaping context, style, and judgment from the start.

Playbook
OpenAI shipped /goal for the Codex CLI. It turns a prompt into a persisted, self-continuing contract.