Back to Podcast Digest
Nerd Snipe1h 17m

We filmed this before the ban

TL;DR

  • Fable impressed them enough to spend over $12,000 on inference in days: Between roughly $10,000 from one host and about $1,800 from the other, they pushed Fable hard and came away thinking it is a genuinely top-tier coding model.

  • They think Fable is basically the real Mythos, not a crippled public variant: Instead of altering the weights heavily like they feared, Anthropic appears to have left the model strong and put the restrictions in front of it through rerouting, prompt changes, and other control layers.

  • For real coding work, they rate Fable near GPT-5.5 but with better taste: Their main praise is not just capability but code quality, API design, UI judgment, comments, and readability, with examples ranging from Effect refactors to multiplayer games and an iPad article-markup app.

  • Anthropic's benchmark story gets hammered: They call SWE-bench deeply compromised because it uses merged PR descriptions from public GitHub repos that are likely in training data, and they also say Frontier Code's scoring patterns look suspiciously random.

  • The release terms are brutal for serious users: Fable disappears from subscriptions after June 22, costs $25 per million input and $50 per million output after Anthropic cut the launch pricing, and flagged prompts can trigger two-year retention instead of 30 days.

  • The biggest outrage is hidden degradation for AI and ML prompts: Anthropic initially used silent prompt modification, steering vectors, or parameter-efficient fine-tuning to make Fable worse on frontier model development tasks, then reversed course after backlash and made those reroutes visible.

The Breakdown

Anthropic gave people 10 days of a model the hosts think is roughly GPT-5.5 class with better taste, then wrapped it in bizarre safeguards that quietly rewrote or rerouted prompts and made enterprise use almost impossible. The result is a glowing review of Fable's coding quality and a brutal critique of Anthropic's safety culture, retention policy, and willingness to make the model secretly worse for certain research tasks.

Was This Useful?

Share