0xSeroMay 25, 202643m

Smarter AI Agents with Repo Prompt - With Eric Provencher

TL;DR

Context beats autonomy — Provencher’s core claim is that agents perform best when you aggressively curate what they see, because wasting 60k+ tokens on ad hoc searching leads to drift, missed files, and shallow edits.
Repo Prompt now acts like an orchestration layer, not just a prompt tool — the flow he describes is agent → context builder → oracle → plan file → sub-agents, with stateful tools that track file slices and rebase edits when multiple agents touch the same file.
GPT models win for instruction-following; Claude wins for nimbleness — he trusts OpenAI’s models more to obey rules like project instructions and structured workflows, while Claude is better for fast, local steering but more likely to skip parts of a spec.
Long sessions don’t have to compact if delegation is good enough — Provencher says he can run four- to five-hour threads without compaction by having a manager agent orchestrate isolated sub-agents instead of stuffing every action into one bloated context window.
Open-weight models still lag hardest on tool use and workflow fidelity — his simple benchmark is whether a model remembers to set the chat title at thread start, and he says many open models fail even that, though Kimi, GLM, and DeepSeek V4 Flash stand out.
Most users should audit their harness before chasing the next model — his practical advice is to inventory every MCP, plugin, CLI tool, and skill exposed to the model, strip it to the minimum, and iterate on prompts by watching traces rather than trusting first drafts.

The Breakdown

Eric Provencher says the real frontier for coding agents is not a better chatbot but a better context pipeline: collect exactly the right slices of a repo, hand them to a strong reasoning model, and let sub-agents work in clean isolated threads for hours without collapsing into junk. He explains how Repo Prompt evolved from XML patching into an MCP-driven orchestration layer that can plan, review, and ship code with far less manual checking.