Back to Podcast Digest
Theo - t3.gg39m

I exploited Copilot and burned $46,000 (it cost $40)

The Breakdown

{ "tldr": [ "Theo says Copilot’s old pricing was fundamentally broken — GitHub’s $40 Copilot Plus plan gave him 1,500 “messages,” but he estimates his first 5% of usage had already cost Microsoft at least $550 in inference.", "The core exploit is simple: one 'message' can secretly become hours of agentic compute — Theo shows a single long-running puzzle-solving request hitting 111 million input tokens, 1.6 million output tokens, and roughly $62 with caching, even though it still counts as just one message.", "He built an automated setup to deliberately burn Copilot credits at scale — running 50 staggered sessions against intentionally unsolvable cryptography prompts, he says 60 messages averaged about $545, or roughly $10 per message, with some runs reaching the $60+ range.", "His broader argument is that message-based billing died when tools became agents — once Copilot, Claude Code, Cursor, and Codex started chaining tool calls and repeated API requests, the cost of a 'message' stopped being predictable and started ranging from pennies to potentially $30+ or more.", "This isn't a rug pull" is his main thesis — Theo argues Microsoft isn’t suddenly getting greedy; they waited too long to close a loophole that let users extract wildly more compute than they paid for, while GPU capacity could have been sold elsewhere.", "He uses T3 Chat as the cautionary tale — after seeing users burn $200+ in a few days and Claude traffic become 10x more expensive than everything else combined, his own team had to abandon simple message quotas because “selling messages is a suicide mission.”" ], "breakdown": "### From Copilot nostalgia to a pricing backlash\n\nTheo opens by contrasting old Copilot — basically autocomplete — with today’s agentic coding tools battling Cursor, Claude Code, and Codex. The spark for the video is GitHub’s pricing shift away from fixed monthly messages and toward token-based limits, which triggered outrage he thinks misses the actual economics.\n\n### Theo’s long-running hobby: wasting Microsoft’s money\n\nHe frames himself as unusually qualified here because he’s already burned through huge Azure credits and even built hourly benchmarks to prove Azure inference was painfully slow — at one point P90 latency was 21x worse than OpenAI. That benchmark went viral, Microsoft escalated internally, and he says Azure eventually fixed it so thoroughly that it’s now faster than OpenAI for him.\n\n### The four billing models every AI user should understand\n\nTheo walks through subscriptions with rate limits, subscriptions with message limits, spend-limit hybrids like Cursor, and raw API billing per token, before landing on dedicated compute — the way labs and big enterprises really think. His big point: most users are emotionally attached to usage patterns, not actual compute costs, so they freak out when billing changes even if they never understood what they were consuming.\n\n### T3 Chat learned the hard lesson first\n\nHe uses his own product as the case study: T3 Chat originally offered 1,500 cheap-model messages and 100 premium messages, but Sonnet quickly became a money pit, generating a third of traffic and 10x the cost of everything else. Some users cost them $200+ in days, forcing changes that hurt revenue and optics, but taught him that a message is not a stable unit of value — his analogy is paying “a million dollars for five cars” without knowing whether they’re Ferraris or broken 2001 Subarus.\n\n### Why agents destroy message-based pricing\n\nThe crux is that a message used to mean one request and one response, but now a single prompt can trigger search, tool use, repeated reasoning, and dozens or hundreds of follow-up API calls. Theo says T3 Chat’s cost spread might be a cent to $1 or $2, while Copilot’s can jump from a cent to $30 or more because there doesn’t appear to be a meaningful hard cap on how long agentic runs can go.\n\n### The cryptography puzzles that became a billing weapon\n\nTheo explains his DEF CON-inspired puzzle obsession and shows how GPT could spend 81 minutes solving one custom puzzle, and much longer on harder versions. He got especially excited by runs lasting 157 minutes, then 16 hours and 10 minutes, because that proved a “single message” could hide absurd amounts of compute if the model kept grinding away.\n\n### Turning Copilot into Microsoft’s worst-case scenario\n\nOn Copilot, he notices model multipliers are weirdly permissive: GPT-5.4 is 1x, GPT-5.5 is 7.5x, Opus 4.7 is 15x, and reasoning level apparently doesn’t change message burn. So he creates prompt files with cryptography challenges, adds restrictions to stop the model from cheating, changes one letter to make them unsolvable, then launches 50 staggered sessions from a remote machine to dodge extra rate limits Microsoft has started adding.\n\n### His verdict: users got addicted to a loophole\n\nTheo says this isn’t proof Microsoft is broke or evil — it’s proof they let an unsustainable subsidy survive way too long. With GitHub moving on June 1 to AI credits based on input, output, and cache tokens, his conclusion is blunt: if you were getting $40,000 of inference from a $40 plan, the problem wasn’t the new pricing, it was that the old system should never have existed this long.", "oneLiner": "Theo uses cryptography puzzles to intentionally rack up huge Copilot inference bills and argue that GitHub’s message-based pricing wasn’t a bait-and-switch — it was an obviously broken loophole made impossible by agentic AI.", "tags": ["commentary", "industry", "product"] }

Share