Back to Podcast Digest
AI Engineer··1h 23m

Building your own software factory — Eric Zakariasson, Cursor

TL;DR

  • Cursor’s real ambition is a “software factory,” not just a better autocomplete — Eric Zakariasson frames the path from “spicy autocomplete” to Dan Shapiro’s level-6 “dark factory,” where humans provide intent and agents handle coding, testing, reviewing, and shipping.

  • The bottleneck shifts from writing code to designing systems agents can navigate and verify — Zakariasson says modular codebases, recognizable patterns like package.json start scripts, guardrails, and especially verifiable outputs via unit, integration, Playwright, and UI tests are what make autonomy actually work.

  • Rules are most useful when they emerge from failures, not when you install a giant preset pack — he argues Cursor rules are widely misunderstood: instead of loading every Next.js rule you can find, teams should create SOP-like rules only when agents repeatedly go off the rails.

  • Cursor is already operating pieces of this factory internally with cloud agents, automations, and review bots — examples included separate VMs per agent, Bugbot reviewing PRs, automated daily review summaries from Slack/GitHub, agents learning from merged PR comments, and Linear tickets that auto-spawn cloud agents.

  • The human job becomes manager-plus-architect, with less code reading and more scoping, parallelization, and trust calibration — Eric says he often runs 5-10 cloud agents at once, works across 4 repos or areas simultaneously, and spends more time planning synchronously while execution happens asynchronously.

  • Mission-critical systems still demand heavier upfront quality investment, not blind delegation — for brownfield or security-sensitive code, he recommends spending “a lot of compute and tokens” before human review, manually writing critical tests, and using security-specific automations like Cursor’s internal “security sentinel” on risky PRs.

The Breakdown

From autocomplete to the “dark factory”

Eric Zakariasson opens by saying Cursor itself isn’t fully a software factory yet, but parts of the company already run that way. He borrows Dan Shapiro’s six levels of autonomy — from “spicy autocomplete” to a black-box “dark factory” — and says most people are still around levels 2-3, while he increasingly works at level 4: delegating work to agents and reviewing outcomes more than code.

Why build a factory at all?

The pitch is simple: throughput, consistency, and better leverage of human taste. Agents can run 24/7, and if you build the assembly line right, you get repeatable output instead of the “probabilistic chaos” people complain about when agents wander off without enough structure.

The three ingredients: primitives, guardrails, enablers

Zakariasson breaks the factory into codebase primitives, agent guardrails, and capability enablers. A modular repo, recognizable usage patterns, and collocated files help agents orient quickly; rules, hooks, and tests keep them from touching dangerous areas like auth or encryption; and skills, MCPs, feature-flagging, and runnable environments let them actually do useful autonomous work.

A live Cursor 3 demo and the Ableton-style music app

He shows Cursor 3 — a complete rewrite with no VS Code underneath — designed for an agent-first workflow. Using a side project that mimics Ableton, he explains how he intentionally avoided writing code himself, forcing the agent to discover the package.json, start the dev server, and create its own Playwright end-to-end tests so it could verify things like the play button and note entry without him babysitting.

Agents reviewing themselves — and recording proof

One of the stickiest moments is his demo of cloud agents on separate VMs, complete with computer-use tooling that records a video of the agent testing its own changes. Instead of asking a human to read every diff, the system can show a short clip of keyboard navigation, UI focus states, and the actual behavior change — a more manager-friendly artifact than raw code.

The mindset shift: from worker to manager

Once this infrastructure exists, he says the biggest change is psychological: you stop living in code and start managing fleets of agents asynchronously. The problems start sounding eerily like human org design — task scoping, parallelization, merge conflicts, manager-of-manager layers — and he says there’s no shortcut except “spawn a shitload of agents” and learn their strengths and weaknesses through repetition.

The internal automations that make Cursor feel factory-like

He walks through several concrete automations at Cursor: a daily review that summarizes his Slack and GitHub activity, a system that reads merged PR comments to capture high-signal human feedback, an “agentic code owner” that unblocks low-risk PRs while escalating high-risk ones, and a “continue learning” plugin that mines transcripts to automatically turn repeated corrections into rules.

The Q&A reality check: isolation, enterprise risk, and team design

In Q&A, he strongly recommends isolated environments over shared workspaces, even if separate VMs are more expensive, because they scale better and avoid weird side effects; he says Cursor likely runs multiple thousands of agents a day. He’s blunt that architecture, safety, and critical systems still need heavy human judgment, better tests, and targeted automations like a security sentinel, while teams themselves start to look more like hybrids of PMs, designers, analysts, and engineers all steering agentic workflows rather than hand-writing every line.