Back to Podcast Digest
Alex Finn··1h 51m

LIVE: Opus 4.7 is incredible, new Codex automated my life, Claude Design is MWAH

TL;DR

  • Alex Finn says Opus 4.7 is the best coding model on Earth — and then stress-tests that claim live — after arguing that X is full of biased Anthropic hate, he runs his “world famous” four-part benchmark and gives Opus 4.7 a winning 34.5/40, ahead of Opus 4.5 at 33.5 and Gemini 3 Pro.

  • The stream’s most revealing moment is when Claude breaks on-air right after he defends it — Opus initially stalls in both desktop and CLI, prompting an “egg on my face” meltdown, before suddenly recovering and producing what he calls the best FPS and city flythrough outputs he’s seen yet.

  • CLI clearly outperformed Claude Desktop in his testing — the same first-person shooter prompt in the CLI generated bloom post-processing, muzzle flashes, particle bursts, recoil, and much heavier graphics, convincing him to stop recommending the desktop app for serious builds.

  • Codex’s new computer-use workflow impressed him because it automated real work, not toy demos — his standout example was a two-prompt chain where Claude made YouTube membership badges and Codex uploaded them into his YouTube dashboard automatically.

  • He keeps hammering one core point: benchmark AI on useful work, not trick questions — instead of asking things like how many X’s are in “December,” he wants models judged on building apps, games, visualizers, and startup brainstorming, where he says Opus 4.7 was “significantly better” than prior versions.

  • The stream turns motivational in a very Alex Finn way — he responds to a commenter calling him “too hype” with a full rant about waking up excited to build, then later tells the story of teaching himself Swift at 25 to build a demo app for Grindr, which he says completely changed his confidence and career trajectory.

The Breakdown

“If you say it’s muted and it’s not, you’re banned”

Alex opens in full live-stream mode: chaotic, chatty, and very clear that this is not a polished 10-minute upload. He frames it as the biggest AI news week of the year, promising Opus 4.7 benchmarks, Claude Design, ChatGPT Codex computer use, and some Perplexity PC talk — while also warning replay viewers that this is a hangout, not a speedrun.

The Opus 4.7 defense before the benchmark even starts

Before testing anything, he goes hard against what he sees as anti-Anthropic sentiment on X, saying Anthropic barely interacts with him and that he has “the least reason to glaze them.” His blunt take: anyone claiming Opus isn’t the best coding model is either biased, paid, or carrying company beef, and he says the AI creator ecosystem is “95% corrupted.”

The hype rant that became the emotional center of the stream

A viewer complains that he’s too hyped, and Alex turns it into a manifesto. He draws a bright line between dishonest hype and genuine excitement, then says if your problem is simply that he’s too enthusiastic, “I don’t give a [__]” — because he wakes up grateful to be alive in a moment where AI lets him build businesses and products he couldn’t have built before.

A tiny agentic workflow that made the whole stream click

While Opus is spinning up, he demos a simple but sticky workflow: Claude made new YouTube membership badges, then Codex used computer use to upload them into his YouTube dashboard. It’s only two prompts, but for him it’s the clearest proof that these tools are shifting from content gimmicks to actual operational leverage.

The funniest part: Claude immediately faceplants on-air

Right after spending 20 minutes praising Opus 4.7, Claude stalls repeatedly and appears broken in both CLI and desktop. Alex openly laughs at the disaster — “major egg on my face” — checks Anthropic status, complains about outages, and says this kind of reliability gap is exactly how OpenAI could win after “mortgaging the global economy” for GPUs.

Then the benchmark actually lands — and the CLI steals the show

Once Claude recovers, the first-person shooter benchmark is genuinely strong: reload animation, sound effects, power-ups, enemies, and enough polish for Alex to call it the best FPS benchmark yet. Then the CLI version blows past the desktop output with bloom post-processing, particle effects, recoil, tracer fire, and performance so heavy it nearly melts his 512GB Mac, which makes him reverse his own recent desktop-first recommendation.

Dancing Elon, city flythroughs, and why he thinks people test models wrong

The Elon dancing benchmark is still weird-looking, but by his historical standard it’s the best yet: a recognizable face, an X logo on the shirt, and far more structure than older generations. The city flythrough gets similar praise, and throughout this stretch he keeps returning to the same complaint — too many people judge models with trick questions instead of asking them to do the kind of work users actually care about.

A quick Grok detour, a personal origin story, and the final verdict

He briefly compares against Grok, gripes that Grok 4.3 appears locked behind a $300/month tier, and gets a comically bad result that reinforces his view that xAI still hasn’t delivered despite the hype. Later, after a viewer asks about his mindset, he tells the story of being a low-confidence C-minus student until, at 25, he taught himself Swift in a week to build a demo app for Grindr that helped close a $200,000 deal — the moment he says taught him that skills, confidence, and intelligence are all trainable. By the end, Opus 4.7 wins his benchmark at 34.5/40, and he closes by promising Codex benchmarks next Monday and a live bootcamp immediately after the stream.