Back to Podcast Digest
Jo Van Eyck··13m

Cut your LLM token bill in half with these 2 simple tricks.

TL;DR

  • 'Caveman mode' really does cut token use — Jo shows that a single instruction like “Respond like a terse smart man, all technical, no fluff” dropped a simple FizzBuzz run from 7.6K tokens to 5.8K, and cites repo examples where replies shrink from roughly 70 tokens to 20.

  • The win is mostly from less English, not better code — his standing agent rule is to be very succinct and not explain generated code, which matters when your coding assistant spends lots of tokens narrating what it just did.

  • RTK attacks a bigger hidden cost: verbose tool output — the Rust Token Killer hooks into agent lifecycle events and rewrites commands like git log, test runs, and other CLI output into much more compact summaries before they hit the model.

  • In a realistic backend task, RTK beat the caveman trick — on a non-trivial e-shop feature in a large but well-structured codebase, message-window usage fell from about 19.1K tokens to 12K with RTK, a savings Jo calls a “hard yes” for daily use.

  • These optimizations are framed as both cost control and waste reduction — Jo ties the whole exercise to Anthropic rate changes, capped usage bugs, and the broader point that we shouldn’t “boil the ocean” while experimenting with AI coding agents.

  • He’s practical, not claiming a rigorous eval — the benchmarks were only run a few times, not as a full evaluation, but they still landed at a 24% output-token drop for caveman mode on FizzBuzz and nearly 40% for RTK in his normal workflow.

The Breakdown

Why Token Frugality Suddenly Matters

Jo opens with a joke — “Me write spec, you code, no mistake” — then quickly turns serious: token budgets are worth watching now that rates are changing, subsidized tokens are disappearing, and Anthropic had bugs where caps kicked in early. His framing is simple: experiment all you want, just don’t “boil the ocean” if there are easy ways to waste less.

The Ridiculous Repo That Just Tells Models to Be Terse

The first trick is almost comically small: a GitHub repo whose core instruction is basically, “Respond like a terse smart man, all technical, no fluff.” Jo laughs at the idea that this needs a whole repo, but the reported effect is real — if your agent normally spends lots of tokens on English explanations, forcing it to speak in clipped, sparse language meaningfully cuts output.

Jo’s Own Agent Prompt Has Used This for a Year

He reveals his “super secret” agents.md, and the first rule is essentially the same caveman idea, just phrased a bit more professionally: be succinct and don’t explain the generated code. He also slips in a practical tip: put a version number in agents.md and have the agent echo it back so you know prompt changes actually took effect.

FizzBuzz Test: 5.8K Tokens vs. 7.6K

To make the comparison concrete, Jo uses GitHub Copilot CLI instead of Claude Code and asks it to implement FizzBuzz in small iterative steps. In caveman mode the run uses about 5.8K tokens; without that terse system instruction it climbs to 7.6K, which is enough for him to say, yes, this silly trick “absolutely does” work.

RTK: A Birthday Gift That Hooks the Agent Pipeline

The second technique comes from his colleague Kristoff, who pointed him to RTK, the “Rust Token Killer.” RTK plugs into coding agents through hooks — scripts that fire on lifecycle events like tool calls or completion — so when the agent runs git, tests, Python, or pip, RTK intercepts the output and compresses the noisy parts before the model sees them.

The Real Magic Is Shrinking Tool Chatter Like git log

Jo shows the core problem with a command like git log: every verbose line gets shoved into the LLM context. RTK rewrites those commands by prefixing them through its binary, producing output that is functionally equivalent but dramatically more compact — and if your favorite CLI tool isn’t supported yet, he says it only takes a few minutes to configure.

Real Workflow Test: 19.1K Down to 12K

For the serious benchmark, he switches from toy examples to the way he actually works now: giving an agent a fairly complete backend story in a sizable e-shop codebase with acceptance criteria and tests. Without RTK, one sub-agent fills the message window to about 19,100 tokens; with RTK, that drops to roughly 12,000, which he calls an “impressive drop” and an obvious addition to his setup.

Final Numbers, With a Bit of Healthy Humility

Jo closes by stressing these are experiments, not a full-blown eval — he only ran them a few times because, again, he doesn’t want to “boil the oceans” for a YouTube video. Still, the takeaway is crisp: caveman mode cut output tokens by about 24% on FizzBuzz, RTK cut token use by almost 40% in a realistic backend scenario, and he’s now sold on both.