AI EngineerJune 28, 20265m

Your Agent Is Wasting Tokens and You Don't Know It - Erik Hanchett, AWS

TL;DR

Cache your system prompts: On the first call, send the full prompt; on subsequent calls, the cached version reduces token usage dramatically.
Route by difficulty: Use cheaper models like Claude Haiku for simple tasks and reserve frontier models like Claude Sonnet for complex ones, or even use a cheap model to decide routing.
Offload large tool results: Store results locally or in the cloud and send summaries instead of re-injecting the full payload into every agent loop iteration.
Cap your tool loops: Without a max iteration limit, agents can call tools 10-20 times or hit infinite loops, burning through tokens unpredictably.
Trim conversation history: Use a sliding window to send only the last N messages, and summarize older context rather than re-sending entire conversation histories.

The Breakdown

Erik Hanchett from AWS reveals five concrete techniques to slash token costs when building AI agents, from caching system prompts to capping tool loops that can otherwise spiral into expensive infinite cycles.