AI EngineerJune 25, 202623m

Recursive Coding Agents - Raymond Weitekamp, OpenProse

TL;DR

RLMs as the new test-time compute paradigm: By marrying chain-of-thought reasoning with code execution, RLMs allow models to symbolically explore massive contexts and decompose problems recursively.
Small models beat giants with RLMs: Qwen 3.5 9B using an RLM harness outperformed GPT-5.4 and Opus on the Long CoT benchmark, running on a laptop.
RLMs stir benchmark drama: The Symbolica team achieved 30%+ on Arc AGI-3 within hours using an RLM harness, prompting the Arc Prize team to dismiss the results and refuse private evaluation.
Recursive coding agents apply RLM principles to agent frameworks: By making coding agents call themselves recursively (e.g., Y pie), they can self-decompose tasks and verify sub-agent outputs.
Claude Code's dynamic workflows now make it an RLM: Anthropic's addition of dynamic workflows turned Cloud Code into a recursive agent system, settling earlier debates.
OpenProse turns any coding agent into an RLM: Using a markdown-based language, developers can declare sub-agent workflows, dependencies, and verification steps, capturing golden sessions for reuse.

The Breakdown

Recursive language models (RLMs) unify tool calling and reasoning, enabling small models to beat frontier LLMs on long tasks, but the real breakthrough is making coding agents reliable enough to trust with outcomes like full SaaS apps or, carefully, crypto wallets.