Your Agent Is Wasting Tokens and You Don't Know It - Erik Hanchett, AWS
TL;DR
Cache your system prompts: On the first call, send the full prompt; on subsequent calls, the cached version reduces token usage dramatically.
Route by difficulty: Use cheaper models like Claude Haiku for simple tasks and reserve frontier models like Claude Sonnet for complex ones, or even use a cheap model to decide routing.
Offload large tool results: Store results locally or in the cloud and send summaries instead of re-injecting the full payload into every agent loop iteration.
Cap your tool loops: Without a max iteration limit, agents can call tools 10-20 times or hit infinite loops, burning through tokens unpredictably.
Trim conversation history: Use a sliding window to send only the last N messages, and summarize older context rather than re-sending entire conversation histories.
The Breakdown
Erik Hanchett from AWS reveals five concrete techniques to slash token costs when building AI agents, from caching system prompts to capping tool loops that can otherwise spiral into expensive infinite cycles.
Was This Useful?
Share
Keep Reading
Make Alcreon Yours
Tune your feedFive quick questions, and the feed ranks what matters to you first.Or just get notified
The weekly Echo. Signal worth keeping in your inbox.
Every new piece, announced on X.
Read Next
See all
Playbook
The Cheapest Model That Passes
OpenRouter lists 400 models behind one API. The fix for choosing isn't a better leaderboard, it's a four-step protocol that ends in a real eval.

Playbook
Cheap Models, Hard Tasks
Most agent workflows route every step to the frontier model by default. The bill scales with how chatty the agent gets, even when most steps don't need that brain.

Playbook
Tasteful Skills
“Tasteful Skills” argues that the best agent skills are not documentation or best-practice lists.