Back to Podcast Digest
Latent Space1h 9m

Devin’s 80% Moment: Background Agents, 7x PRs, & End of Hand-Held Coding — Walden Yan & Cole Murray

TL;DR

  • Devin’s usage hit an 80% tipping point — Walden says Devin-authored code in Cognition’s repos rose from 16% in January to 80% in March, while merged PR volume grew roughly 7x in 2-3 months.

  • The real challenge is testing, not computer use — Clicking UI elements is the easy part; the hard part is reasoning through how to run multi-service apps, enable flags, get the right permissions, and verify cross-stack changes end to end.

  • Out-of-the-box agents are safer, but much harder to build — Running the “brain” outside the sandbox protects secrets and supports cleaner permission boundaries, but forces you to manage state, orchestration, and more complex infra.

  • Most companies still aren’t actually ready for autonomous coding — Repo setup remains a bottleneck because many teams still rely on tribal knowledge like “go ask Bob for the secrets,” which breaks when an agent needs to boot and test a repo on its own.

  • AI coding quality now depends on guardrails as much as model intelligence — The guests call out repeated failure modes like backward-compatibility hacks, untyped tuples, getattr reward-hacking in Python, and codebase “slop” spreading from weak patterns unless linting and cleanup are enforced.

  • The fastest-growing use cases are outside classic engineering — SRE triage, customer support, PM-driven bug fixes in Slack, and internal knowledge workflows are all becoming agent-native because the value comes from integrating code, logs, docs, tickets, and chat in one loop.

The Breakdown

Devin went from writing 16% of code in its own repos in January to 80% by March, while merged PRs grew 7x in a few months with barely any headcount growth — the clearest sign yet that background coding agents have crossed from novelty to serious infrastructure. Walden Yan and Cole Murray explain why the hard part isn’t clicking buttons but orchestrating real testing, repo setup, secrets, memory, and the messy company integrations that make autonomous coding actually work.

Was This Useful?

Share