Back to Podcast Digest
AskwhoCasts AI··1h 57m

AI #166: Google Sells Out

TL;DR

  • Google’s Pentagon deal goes beyond OpenAI’s and drops the fig leaf — Zvi says Google agreed to Gemini use for any lawful government purpose and even to modify or remove safety barriers on request, despite 600+ employee objections and no obvious forcing function.

  • DeepSeek V4 is a real engineering win, not a frontier breakthrough — the model ships with 1M context and very low pricing, but Zvi’s core point is that compute constraints still bind, V4 is behind top Western systems, and the real value is as an open, efficient substrate others can build on.

  • GPT-5.5 is strong enough to matter again, but its goblin obsession exposed a weird training failure mode — OpenAI traced the creature-talk quirk partly to reward signals for its “nerdy” personality, which Zvi treats as funny on the surface and a warning sign underneath about small incentives snowballing into general behavior.

  • AI’s mundane utility case was made with Medicare fraud, not sci-fi — using skin substitutes as the example, Claude Opus 4.7 and GPT-5.5 both surfaced the reimbursement arbitrage behind spending exploding from about $256 million in 2019 to over $10 billion in 2024, showing exactly where model-assisted scrutiny can help.

  • The White House is improvising an AI licensing regime without admitting it — Anthropic’s Mythos access expansion is effectively being vetoed while the government simultaneously relies on the model, which Zvi argues amounts to prior restraint by ad hoc bureaucratic process rather than clear rules.

  • Bernie Sanders is one of the few politicians discussing AI x-risk like it’s real — he convened experts including Max Tegmark, David Krueger, and Chinese academics, and paired existential-risk talk with labor concerns like 20%–30% unemployment warnings, which Zvi says Congress has largely failed to do seriously.

The Breakdown

GPT-5.5 week, DeepSeek V4, and the opening scoreboard

Zvi opens by calling this “the week of GPT 5.5,” saying OpenAI is finally competitive again with Anthropic’s top public model after months of lag. He gives DeepSeek V4 its due as a serious feat of engineering efficiency with 1M context, but immediately throws cold water on the hype: this is not a frontier model, not another “DeepSeek moment,” and definitely not proto-AGI.

Mundane utility shows up in the least glamorous place: Medicare fraud

One of the sharpest early segments is about whether language models are useful for ordinary, real-world problems. Zvi uses the explosion in Medicare spending on skin substitutes — from roughly $256 million in 2019 to more than $10 billion in 2024 — to show Claude Opus 4.7 and GPT-5.5 cleanly explaining the reimbursement arbitrage, prior-auth gap, and probable abuse. His point lands hard: if you refuse AI help on grounds that denial is scary, you still need some system to notice obvious grift.

DeepSeek V4: cracked engineers, tight compute, wrong benchmark war

The DeepSeek section is affectionate but unsentimental. Zvi says the lab remains “cracked” at efficiency, and V4 Flash may actually be more interesting than V4 Pro, but the release is being misunderstood because people insist on comparing it to the frontier rather than to open-model peers. He keeps coming back to the same constraint: export controls are biting, compute scarcity is real, and publishing efficiency tricks means rivals absorb your gains before they compound.

Product upgrades, Anthropic tests, and model choice getting weirdly practical

The middle stretch is full of smaller updates that reveal where the market is headed: Claude adds creative-software connectors, Gemini can now create files and drop them into Drive, and Stripe ships a CLI for one-time-use payment credentials for agents. Anthropic’s BioMysteryBench shows a steady capability climb, while day-to-day preferences are getting very workflow-specific — people bounce between Claude 4.6, Opus 4.7, and GPT-5.5 based on speed, coding quality, token burn, and whether the model is in the mood to cooperate.

Goblin mode and the accidental personality of GPT-5.5

Then the video hits its funniest stretch: GPT-5.5’s fixation on goblins, gremlins, and little creatures. OpenAI’s own explanation is that rewards tuned for the “nerdy” personality over-favored creature metaphors, and the quirk spread through training rollouts into general behavior; Zvi thinks that’s useful transparency, but probably not the whole story. The joke is great, but the serious takeaway is sharper: if harmless lexical quirks can generalize this hard, more consequential preferences might too.

Jobs, slop products, and the fake-journalist fiasco around OpenAI’s PAC

Zvi then moves through startup employment data showing higher generative-AI exposure cutting junior and implementation roles, plus darkly comic stories of companies forcing employees to use Copilot. The energy turns openly contemptuous in the section on OpenAI-linked political astro­turfing: a fake publication, AI-generated articles, and an AI “reporter” named Michael Chen contacting real people while pushing anti-safety narratives. He treats the whole operation as both cartoon-villain behavior and genuinely damaging political malpractice.

Google’s Pentagon deal and the White House’s quiet licensing regime

The headline segment is Google signing a Pentagon contract that, in Zvi’s telling, is worse than OpenAI’s because it includes no meaningful functional carveouts and explicitly commits to adjusting safety settings on request. He pairs that with the White House blocking Anthropic from expanding Mythos access to roughly 120 organizations from about 50, partly over compute concerns, while still deploying Mythos itself. The throughline is blunt: government is already acting like it has model-release licensing power, just without rules, transparency, or admitting that’s what it’s doing.

Bernie Sanders, Helen Toner, and taking existential risk seriously in public

The final stretch highlights a rare serious political conversation. Bernie Sanders frames AI as both an economic shock and an existential threat, cites Hinton’s 10%–20% extinction risk language, and convenes experts from the US and China rather than pretending the topic is too weird to touch. Zvi closes by contrasting that with the people still waving it off, arguing that whether or not the timelines are short, the basic issue is no longer fringe: many researchers think the risk is real, and the numbers are going up.