Back to Podcast Digest
How I AI17m

Claude Fable 5 (Mythos) - is the world’s best coding model as good as they say?

TL;DR

  • Fable 5 looks like a benchmark monster, and mostly feels like one: Claire says Anthropic's 80% on SWE-bench Pro and its lead over Opus 4.8, GPT-5.5, and Gemini 3.1 Pro matched her experience that the model rarely failed technically on hard coding tasks.

  • It is priced and designed like a heavy-duty model: At $10 per input token and $50 per output token, with roughly 2x the token usage of other models, Fable 5 is costly enough that Claire argues you should save it for tasks that actually need this much reasoning.

  • Vision was the biggest pleasant surprise: In a simple but revealing test generating second-grade handwriting worksheets from classic texts, Fable 5 beat Opus 4.8 on spacing, readability, and document layout, which made Claire think it is notably better on PDF and document formatting work.

  • Its prose is so thorough that it becomes hard to use: When Claire asked it to review requirements for her open-source ChatPRD product graph project, it produced long, deeply detailed markdown that felt intelligent but was tough to parse, full of dense blocks and internal references.

  • Front-end and design work were the biggest miss: A one-shot request to design a skills registry produced what Claire called fundamentally terrible UI, with gray, black, and red outlines and generally ugly output, even after more detailed prompting.

  • Anthropic's safety setup changes the product experience: Fable 5 includes classifiers for cybersecurity, biology, chemistry, and distillation, and instead of hard-blocking it can gracefully fall back to Opus 4.8; Anthropic says 95% of sessions never hit that fallback.

The Breakdown

Anthropic's new Claude Fable 5 posts eye-popping benchmark scores and genuinely impressed on vision and long-running technical work, but Claire found it surprisingly bad at design, painful to read for specs, and expensive enough to make model choice matter. Her verdict: this is a serious coding model for hard execution problems, not a default pick for product thinking or front-end polish.

Was This Useful?

Share