Back to Podcast Digest
The Artificial Intelligence Show Podcast20m

Why No One Has Figured Out AI Pricing Yet

TL;DR

  • Token-based billing is the core problem: Input and output tokens are priced differently (output costs 2-5x more), and agentic AI loops resend entire conversation history as input on every turn, causing costs to skyrocket with volume.

  • Agent loops are the hidden cost multiplier: AI coding agents and customer support assistants resend large context (like a 20,000-token knowledge base) on every request, turning a simple task into 20 million tokens per day for 1,000 queries, costing $60 just to reread static content.

  • Prompt caching helps but varies wildly by provider: Anthropic requires manual marking of cache, OpenAI does it automatically, Google has two modes; using it is the main way to reduce API costs, but most businesses on per-seat plans can't access these levers.

  • Per-seat licensing and pooled usage add another layer of confusion: Claude Team gives individual usage limits, Claude Enterprise pools them, Gemini Enterprise pools by default with clear quotas, and OpenAI works differently; you can have eight different billing behaviors even within one provider.

  • Labs want to sell 24/7 agentic loops, but can't price them predictably: The vision is a 300-person marketing team run by 15-30 people with autonomous agents burning tokens continuously, but enterprises can't plan for that without cost predictability, and most executives can't even grasp current model capabilities.

  • No simple solution exists because adoption and literacy are mismatched: Raising per-seat prices to $500/month works for advanced users like SmarterX but fails for most enterprises that haven't seen ROI yet; labs can't market human replacement costs, and the average knowledge worker has no framework to understand token economics.

The Breakdown

AI pricing is a chaotic, developer-focused mess that's leaving enterprises struggling with exploding token costs and no clear path to predictability, forcing companies like RBC and Cisco to report usage jumps of 500% and "pretty crazy" levels, while labs themselves seem to be making it up as they go.

Was This Useful?

Share