Notion’s Sarah Sachs & Simon Last on Custom Agents, Evals, and the Future of Work
TL;DR
Notion’s custom agents were a 3-year grind, not a one-model unlock — Sarah Sachs and Simon Last said they’d been rebuilding the idea since late 2022, but only after function calling matured, context windows expanded, and models like Claude Sonnet 3.6/3.7 got reliable enough did the product finally become production-worthy.
The big product lesson was “give models what they want,” not what Notion wants — Simon described abandoning Notion-specific XML and complex JSON query formats in favor of markdown and SQLite-like queries because models handled those dramatically better, which became a core design principle for the whole stack.
Notion treats evals as a full product system, not a dashboard of pass/fail tests — Sachs broke evals into CI-style regression tests, launch report cards targeting 80-90% on key user journeys, and “frontier/headroom evals” intentionally stuck around 30% pass rate to expose what future models could unlock; they even staff a data scientist, eval engineer, and model behavior engineer on “Notion’s last exam.”
The company is reorganizing around agents as first-class users of the product — Sachs said the long-term assumption is that most Notion traffic will come from agents rather than humans, so the same teams that built CRDT collaboration, SQL infrastructure, and product surfaces now also own how agents edit blocks, query databases, and interact with the product safely.
They’re bullish on both CLIs and MCP, but for different jobs — Simon called CLIs powerful because agents can bootstrap and debug their own tooling in-terminal, while MCP is “the dumb simple thing that works” for narrow, tightly permissioned agents; Sarah added that Notion will keep investing in its MCP because being the enterprise system of record means meeting users where they are.
Notion’s core thesis is that agents should replace process, not people — Their most concrete examples were bug triage from Slack, tenant application processing, meeting-note-driven task creation, and manager agents supervising 30+ other agents, all aimed at removing bookkeeping while keeping humans focused on decisions and collaboration.
The Breakdown
Why the custom agent launch felt both overdue and right on time
The episode opens with the recent launch of Notion custom agents, and the mood is equal parts relief and “finally.” Sarah Sachs says Notion ships slowly on purpose, so by launch day the team is already two or three milestones ahead; still, this was their most successful launch yet in free trials and conversion, helped along by a 3-month free offer and the fact that users now instantly understand why AI tools are useful.
The 2022-era agent experiments that were simply too early
Simon Last says one of Notion’s first GPT-4-era instincts in late 2022 was already “give it all the Notion tools and let it work in the background.” But before function calling, they were hand-rolling tool frameworks, fine-tuning models with partners like Anthropic, OpenAI, and Fireworks, and basically banging their heads against models that were too dumb and too short-context to make the experience robust.
Sarah’s river metaphor for frontier product strategy
Sachs gives maybe the cleanest strategic framing of the conversation: the skill is knowing when you’re swimming upstream against model limitations versus when you’re just not building the right infrastructure, then figuring out “which direction the river is flowing.” That’s how she explains Notion’s AI roadmap — keep shipping what works now, but always carry a few “AGI-pilled” projects that feel obvious only 18 months later.
What actually changed: stop forcing Notion abstractions onto models
One of Simon’s most useful mini-histories is how Notion rebuilt the agent stack over and over. They started with a coding-agent-like JavaScript API approach, then a custom XML tool-calling format, then learned the hard lesson: models do better with markdown and SQLite-style queries than with beautiful internal abstractions optimized for Notion’s data model. His summary is blunt: “give the models what they want.”
How Notion runs AI teams without ego or turf wars
Sachs says her job isn’t to be the ideas person or the technical oracle — it’s to keep everyone aligned on the objective and able to prioritize. The culture she describes is unusually low-ego: engineers are expected to delete their own code, teams form after things ship rather than before, and the “Simon vortex” is a real internal mode where senior engineers swarm around prototypes moving too fast for org charts.
Demos over memos, and why evals became a whole discipline
Because everyone inside Notion uses Notion constantly, prototypes quickly become live internal tools, not slides. But that freedom creates a maintenance burden, so Notion built what Sachs calls an “agent dev velocity” platform: central eval frameworks, team-owned evals, nightly runs, CI hooks, and a new role — model behavior engineer — for people with weirdly perfect mixes of linguistics, product instinct, and prompt craft. They’ve even pushed toward agents writing evals for themselves.
The software factory, coding agents, and the future software engineer
Simon is clearly obsessed with coding agents and the “software factory” idea: a system where agents develop, debug, review, merge, and maintain code with humans supervising the outer loop. Both guests describe software engineers shifting from typing every line to managing streams of agent work — not quite becoming managers, but definitely climbing the abstraction ladder and living in a much more technical version of delegation.
A live demo of agents doing real operations work
The conversation gets especially concrete when Alessio demos a Notion agent he made for Kernel Labs tenant applications: ingesting inbox submissions, creating rows in a database, and enriching them with web search. That leads into Notion’s own internal examples — bug triage agents in Slack, 30-plus go-to-market agents supervised by a manager agent, and a philosophy that agents should use Notion primitives like pages and databases as memory rather than inventing exotic new abstractions.
MCP, CLIs, pricing, and the hard economics of agent products
Late in the episode, Simon argues CLIs are amazing because agents can build and repair their own capabilities in-terminal, while MCP is best for narrow, permissioned, lightweight agents. Sachs connects that to pricing: Notion uses credits rather than raw tokens because web search, GPUs, open-source models, and async processing all cost differently, and she’s adamant that pricing should reflect useful capability rather than wasteful token burn.
Meeting notes as data capture, and the bigger future-of-work thesis
The final stretch ties everything back to Notion’s core identity: it wants to be the system of record for enterprise work, not just an AI wrapper. Sachs says meeting notes became one of Notion’s strongest growth and retention loops because they capture the actual substance of work; Simon adds details like summaries that @-mention the right “Simon” based on attendees and profile matching. The team’s dream workflow is meetings with hands off the keyboard, where agents prep the preread, capture the discussion, and turn decisions into tasks and follow-ups automatically.