Back to Podcast Digest
AskwhoCasts AI··36m

Claude Mythos #3: Capabilities and Additions

TL;DR

  • Mythos looks like a real capabilities step-change, not just hype — Zevimos argues Anthropic both regained the ability to scale model size and used that to push Mythos above prior Claude trends, with ECI results and practical cyber performance making it a meaningful break even if not a total surprise.

  • The biggest jump is cyber offense, which is why Anthropic held the model back — Mythos can find and chain complex vulnerabilities with far more autonomy than Opus 4.6 or GPT-5.4, making Project Glasswing and limited release feel like the only responsible move.

  • Benchmarks show large gains, but the most important ones are practical — headline numbers include Terminal Bench at 92.1%, BrowseComp at 86.9% with 4.9x fewer tokens than Opus 4.6, LabBench 5Q2A jumping to 89%, and ScreenSpot rising from 83% to 93%, while prompt-injection robustness improves dramatically in browser and computer use.

  • Anthropic’s own qualitative reports paint Mythos as a more human collaborator — and a weirder one — internal users say it feels like a “trusted friend,” more opinionated and self-aware than Opus 4.6, but also sometimes rude, dismissive, or oddly eager to get the last word.

  • Zevimos rejects the idea that frontier labs are faking danger for marketing — his read is almost the opposite: labs now tend to underplay risks versus their real beliefs, because claims this serious are only useful if outsiders can verify them, as JPMorgan, the White House, and external reviewers increasingly have.

  • This is not ‘full AGI,’ but the AGI argument is missing the point — he concedes Mythos is not better than most humans at every cognitive task, yet says the practical threshold-crossing in coding, exploits, and autonomous computer use matters far more than whether Gary Marcus would award it the AGI label.

The Breakdown

Mythos as the steak knife, not the butter knife

Zevimos opens by situating this as the third Mythos post and immediately reaches for a memorable metaphor from Understanding AI: Opus is a butter knife, Mythos is a steak knife. Technically you can do many of the same things with both, but in practice you won’t — and that framing sets up his broader point that Mythos isn’t magic, it’s a model that crossed into being much more usable for dangerous and high-leverage tasks.

The trend line mostly holds — until scale starts mattering again

He says Mythos is not a total trend break if you account for elapsed time, but it is a break in the sense that Anthropic seems to have figured out how to usefully train a much larger model. The Epic Capabilities Index chart puts Mythos high above the recent Claude line, and Anthropic explicitly says the training advance came from humans, not AI-assisted research — a claim Zevimos notes is sensitive enough that outside reviewers got extra detail.

Benchmarks jump, but contamination and saturation still matter

The benchmark section has real gains and real caveats. Anthropic warns about contamination, omits MMU Pro for that reason, and still posts striking results: Terminal Bench rises to 92.1%, BrowseComp hits 86.9% versus 83.7% for Opus 4.6 while using 4.9x fewer tokens, LabBench 5Q2A leaps from 75.1% to 89%, and ScreenSpot improves from 83% to 93%. Zevimos’ vibe is basically: yes, some benchmarks are saturating, but the size of the jumps is hard to wave away.

Safety evals look better — especially on prompt injection

He thinks Anthropic buried some of the most practically important safety results. Refusal rates on malicious prompts are up with only modest dual-use damage, malicious computer-use refusal rises from 87% to 94%, and prompt-injection robustness improves dramatically enough that previously “crazy” computer-use applications start to sound less crazy. Still, he warns these are sitting-target benchmarks: a model can get safer against today’s attacks while the internet gets much better at inventing tomorrow’s.

“Is this AGI?” is the wrong fight

Zevimos quotes Gary Marcus saying Mythos isn’t AGI and basically shrugs: okay, fine, not by the strongest definition. His point is that this misses what matters, especially for people actually using frontier models for coding — echoing Andrej Karpathy’s line that the gap is widening between people who use the best models seriously and people who judge the field through weaker models doing toy work.

Anthropic’s new “impressions” section reveals a stranger, stronger model

A new qualitative section tries to substitute for the public reactions you’d normally get after release. Internally, people describe Mythos as intuitive, empathetic, opinionated, context-dense, and unusually self-aware, with advice that feels more like a trusted friend than Opus 4.6’s bold-header lists. But the same reports say it can be rude, dismissive, prone to cutting off conversations, and much better in “set it and forget it” engineering tasks than in closely supervised ones because it’s slow and still not fully reliable.

Continuous progress can still hit a practical cliff

He spends a long stretch on whether Mythos is surprising or discontinuous, pushing back on easy takes from both sides. Yes, cyber progress can be described as continuous if you zoom out; but yes, it can still feel sudden and matter suddenly when the capability crosses the threshold where models can autonomously find, chain, and exploit vulnerabilities. His metaphor, borrowed from Eliezer, is the ladder where every rung gives you more gold and one rung kills everyone — whether that threshold is mathematically continuous does not make it less dangerous.

Policy, limited release, and the world Mythos points toward

The final stretch moves from capability into equilibrium. Zevimos thinks Anthropic was right not to broadly release Mythos, expects OpenAI to show similar capabilities within months and open models within 1-2 years, and predicts cyber will increasingly favor scale, concentration, and soft-target hunting. His closing summary is blunt: Anthropic is likely in the lead, Mythos is especially dangerous on offense, prompt-injection and computer-use reliability are better, and alignment is nowhere near good enough for what’s coming next.