How I AIJune 9, 202617m

Claude Fable 5 (Mythos) - is the world’s best coding model as good as they say?

TL;DR

Fable 5 looks like a benchmark monster, and mostly feels like one: Claire says Anthropic's 80% on SWE-bench Pro and its lead over Opus 4.8, GPT-5.5, and Gemini 3.1 Pro matched her experience that the model rarely failed technically on hard coding tasks.
It is priced and designed like a heavy-duty model: At $10 per input token and $50 per output token, with roughly 2x the token usage of other models, Fable 5 is costly enough that Claire argues you should save it for tasks that actually need this much reasoning.
Vision was the biggest pleasant surprise: In a simple but revealing test generating second-grade handwriting worksheets from classic texts, Fable 5 beat Opus 4.8 on spacing, readability, and document layout, which made Claire think it is notably better on PDF and document formatting work.
Its prose is so thorough that it becomes hard to use: When Claire asked it to review requirements for her open-source ChatPRD product graph project, it produced long, deeply detailed markdown that felt intelligent but was tough to parse, full of dense blocks and internal references.
Front-end and design work were the biggest miss: A one-shot request to design a skills registry produced what Claire called fundamentally terrible UI, with gray, black, and red outlines and generally ugly output, even after more detailed prompting.
Anthropic's safety setup changes the product experience: Fable 5 includes classifiers for cybersecurity, biology, chemistry, and distillation, and instead of hard-blocking it can gracefully fall back to Opus 4.8; Anthropic says 95% of sessions never hit that fallback.

The Breakdown

Anthropic's new Claude Fable 5 posts eye-popping benchmark scores and genuinely impressed on vision and long-running technical work, but Claire found it surprisingly bad at design, painful to read for specs, and expensive enough to make model choice matter. Her verdict: this is a serious coding model for hard execution problems, not a default pick for product thinking or front-end polish.