
Playbook
Tasteful Skills
“Tasteful Skills” argues that the best agent skills are not documentation or best-practice lists.
Theo says Google’s new Gemini 3.5 Flash looks great on benchmarks but is terrible to actually use — he praises the headline scores like near-300 tokens/sec and strong Terminal Bench results, then argues the model is misleadingly expensive at $1.50/million input and $9/million output tokens while burning far too many tokens in practice.
The pricing story is the real gotcha — Theo compares Gemini 3.5 Flash to Gemini 3 Flash ($0.50 in / $3 out) and old Gemini 2.0 Flash ($0.10 in / $0.40 out), saying Google has effectively pushed users to models that are up to 20x+ more expensive while hiding price from launch materials.
His hands-on coding test was a disaster — in a benchmark where models rebuild his game “Fish Slop,” Gemini 3.5 Flash was the only one that failed, producing broken code, bad assets, and mechanics that didn’t work, while GPT-5.5 handled the task so well he asked it to make the game 3D.
Google replaced a promising open-source CLI with a buggy closed-source one — Theo says Gemini CLI had 100K+ GitHub stars, 6,000 merged PRs, and real community momentum, then Google folded it into the new “Anti-Gravity CLI,” which he found crashy, awkward, and missing basic polish like reliable exit behavior.
The Railway outage is Theo’s proof that Google Cloud itself is untrustworthy — he points to Railway allegedly being blocked by Google Cloud despite spending $2M+ per month, and connects it to prior incidents like Google accidentally deleting Australian pension fund UniSuper’s cloud subscription.
This is also a people-and-politics story inside Google — Theo goes out of his way to praise Dmitri, Jack, and Gal for building trust around Gemini CLI, then says their work was sidelined after Google brought in Windsurf founders for Anti-Gravity, turning a community-driven effort into what he sees as corporate “slop.”
Theo starts by saying he’s genuinely scared to publish this because the last time he harshly criticized a Google product, the video got demonetized, suppressed, and manually flagged as “enabling dishonest behavior.” He frames the whole video as a career risk, which gives the rant a different weight: this isn’t just content, it’s him deciding the issue is serious enough to burn goodwill and possibly opportunities.
He’s not denying the benchmark story: by the numbers, Gemini 3.5 Flash looks like the best model Google has shipped. He calls out strong performance on Terminal Bench, SWEB Pro, Toolathon, BrowseComp Agent, and MMU Pro, and says Google clearly optimized it for agentic work rather than raw knowledge, with only exceptions like Skatebench where it underperformed Gemini 3.1 Pro.
Theo says Google’s launch materials conspicuously avoid putting dollar signs next to the performance charts, and he thinks that’s because the economics got much worse. He pegs Gemini 3.5 Flash at $1.50 per million input tokens and $9 per million output tokens, versus Gemini 3 Flash at $0.50 in and $3 out, and old Gemini 2.0 Flash at $0.10 in and $0.40 out — then argues that reasoning-token bloat makes the real cost even uglier.
His core complaint is that Google is selling “speed” while ignoring token efficiency, which he says OpenAI is taking much more seriously. He points to Artificial Analysis output-token comparisons and says Gemini 3.5 Flash ends up among the most expensive modern benchmarked models because it generates so much unnecessary text, so even if tokens stream fast, tasks don’t actually finish faster.
Theo uses a practical coding benchmark: asking models to rebuild his old game, Fish Slop, from the original source into a cleaner architecture. Gemini 3.5 Flash was, in his words, the only model that outright failed — broken code, bad glow effects, oversized fish, busted feeding and aging systems, and images so sloppy some didn’t even have transparency; meanwhile GPT-5.5 did so well he escalated the ask and had it turn the game 3D.
He then demos the new Anti-Gravity CLI and runs into buggy scrolling, weird input behavior, frozen generation states, awkward sign-in, and even the inability to cleanly quit without typing /exit. What really bothers him is the strategic move behind it: Google is sunsetting support paths for Gemini CLI and Gemini Code Assist under Pro and Ultra plans, while replacing an open-source tool with a closed-source CLI he says is plainly worse.
Theo lingers here because this is personal: he says Gemini CLI wasn’t perfect, but it was a real open-source project with 100K+ GitHub stars, thousands of merged PRs, and useful patterns for skills and workflows. He gives unusual praise to three Google employees — Dmitri, Jack, and Gal — for taking feedback seriously, earning trust privately, and delaying this exact video for over a year because they made him believe Google might actually get it right.
The final act is Railway: Theo says Google Cloud blocked Railway’s account, taking its web-facing layer offline despite Railway allegedly spending more than $2 million a month. He ties that to Google’s history, including the UniSuper incident where an Australian pension fund’s cloud subscription was accidentally deleted, and lands on a bleak thesis: Google has talent, infra, TPUs, and research, but internal politics, churn, and bad incentives keep turning all of that into products he simply doesn’t trust.
Share
Keep Reading
The Weekly Echo. The inbox-shaped summary of what mattered.
New editorials announced here.

Playbook
“Tasteful Skills” argues that the best agent skills are not documentation or best-practice lists.

Playbook
Learn how tasteful prompting helps you move beyond generic AI output by shaping context, style, and judgment from the start.

Playbook
OpenAI shipped /goal for the Codex CLI. It turns a prompt into a persisted, self-continuing contract.