GPT-5.6 about to DROP
TL;DR
Anthropic’s IPO could be the AI boom’s first real stress test: Wes says public filings would expose revenue, inference costs, margins, cloud commitments, and customer concentration, giving skeptics and believers actual numbers instead of hype.
Opus 4.8 built a full city-economy benchmark almost end to end: In Claude’s new ultra code mode, the model created a simulation with workers, wages, taxes, welfare, businesses, vehicles, balance sheets, and even iterated on bugs and fairness issues itself.
GPT-5.5 still looks stronger than Opus 4.8 on DeepSWE-style coding tests: Wes points out that Claude Opus 4.8 did not beat GPT-5.5 on the Deep Suite benchmark, and says the missing ultra code result is the comparison he really wants to see.
ARC-AGI 3 is tiny in score, but big in significance: Opus 4.8 reportedly hit 1.5 percent, state of the art on ARC-AGI 3, while most models sit at 0.5 percent or below, and observers said its reasoning looked more abstract and human-like.
Benchmark design is shifting from score-chasing to thinking-style testing: Wes highlights ARC-AGI 3, Vending Bench, and Deep Suite as examples of a newer philosophy that tries to force real reasoning on fresh, contamination-free tasks instead of measuring memorized answers.
GPT-5.6 rumors suggest frontier models may ship as rolling updates: References to GPT-5.6 and GPT-5.6 Pro in OpenAI-related backlogs, plus talk of stronger coding agents and a possible 1.5 million-token context window, imply launches may happen every few months rather than yearly.
The Breakdown
Anthropic’s rumored IPO could force the first real financial x-ray of the AI boom, while a separate rumor says OpenAI may answer with GPT-5.6 and a major coding jump. In between, Wes Roth shows Claude Opus 4.8 building a full economic simulation benchmark and explains why its weirdly low but state-of-the-art 1.5 percent on ARC-AGI 3 still matters.
Was This Useful?
Share
Keep Reading
Make Alcreon Yours
Tune your feedFive quick questions, and the feed ranks what matters to you first.Or just get notified
The weekly Echo. Signal worth keeping in your inbox.
Every new piece, announced on X.
Read Next
See all
Playbook
Tasteful Skills
“Tasteful Skills” argues that the best agent skills are not documentation or best-practice lists.

Playbook
The Art of Tasteful Prompting
Learn how tasteful prompting helps you move beyond generic AI output by shaping context, style, and judgment from the start.

Playbook
The Codex /goal Playbook
OpenAI shipped /goal for the Codex CLI. It turns a prompt into a persisted, self-continuing contract.