
Playbook
Tasteful Skills
“Tasteful Skills” argues that the best agent skills are not documentation or best-practice lists.
Google’s Gemini 3.5 Flash may be a bad deal despite the “Flash” label — Eric and Adam argue it’s a “token guzzler” with pricing up to $2.70 per million input tokens and $16.20 output on priority, often landing near Sonnet-level costs while losing the cheap workhorse role Gemini Flash used to fill.
Cursor’s Composer 2.5 is the surprise star of the week for coding — Ray says it has replaced much of his Codex usage because it feels like “a cross between Opus and GPT-5.5,” moves incredibly fast, and even set up a QA workflow that launched multiple browsers and wrote useful bug reports.
The real frontier isn’t just models — it’s harnesses, orchestration, and workflow design — a huge chunk of the conversation is about why tools, skills, MCPs, and goal loops still feel clunky, with too much context bloat, inconsistent model behavior, and lots of hidden latency from figuring out what to do before doing it.
Google fumbled the Anti-Gravity 2 rollout and Gemini CLI deprecation — users woke up to an app update that effectively removed the IDE, broke settings, and forced a separate reinstall, while Gemini CLI users got about 30 days to move despite Google having helped steward the ACP standard.
GPT-5.5 Low and Medium are quietly becoming default workhorses — all three hosts say they’re leaning heavily on GPT-5.5 because it’s reliable and efficient, with Eric saying 5.5 Low often completes narrow tasks faster than ostensibly smaller models because it overthinks less.
Karpathy joining Anthropic feels like a bet on impact and access to the frontier — the hosts frame the move less as a career twist and more as a technologist wanting to be back in the lab, close to researchers and fast-moving ideas, while the window to shape AI still feels unusually open.
The episode opens with the crew trying to make sense of an absurdly packed AI week: Google I/O, new Gemini models, Composer 2.5, and even Andrej Karpathy landing at Anthropic. Eric immediately throws cold water on the Gemini 3.5 Flash excitement, calling it expensive in practice because it “burns tokens like nothing else,” to the point that GPT-5.5 can end up faster and cheaper despite the higher sticker price.
Adam says he loved the older Gemini Flash line for fast agentic workflows, but thinks Google has changed what “Flash” means. The new model is quick in tokens-per-second, sure, but it’s dramatically pricier, has a weirdly expensive caching model, and seems optimized more for coding than as a general-purpose cheap workhorse. Ray adds that it follows highly structured prompts well, but lacks the easy “playful intelligence” he gets from GPT-5.5 when he’s just talking naturally into a mic.
The group speculates that Google is protecting margins and managing compute scarcity rather than trying to win on price. That leads into a broader complaint: businesses built around the economics of earlier Gemini Flash versions may now be broken, especially since the cheaper Gemini 3 Flash preview is being sunset while the pricier 3.5 GA model takes its place.
Eric goes on a full rant about Google’s coding-product strategy, especially the Anti-Gravity 2 rollout. Existing users updated one day and suddenly found the IDE gone, settings broken, and their workflow replaced with something closer to a rough Codex-style app; meanwhile Gemini CLI is effectively being killed for consumers, even though enterprise users keep a lifeline. The vibe is less “product evolution” and more “you woke up and your tool got swapped out underneath you.”
Then the energy flips: Ray is almost giddy about Cursor Composer 2.5. He says it took over everything from planning to implementation, and even helped him spin up a QA agent that created skill files, launched multiple browsers, clicked through flows at high speed, and wrote bug reports that caught issues he’d normally find himself.
Adam backs him up, calling Composer 2.5 “phenomenal” for coding and real-time prototyping during live conversations. Both of them are stunned by the speed, especially if it’s really based on the notoriously heavy Kimi K2.5 model; Eric is impressed too, but warns that shipping rapid-fire checkpoints in response to bug reports can create regressions and make a model feel unstable from week to week.
From there the talk gets more philosophical and practical: tool search, MCPs, skills, and giant prompt setups are still awkward. Eric argues that dynamic tool loading is partly a crutch for people who preload too much junk, while Adam says he stays minimalist because once you load 50 or 60 tools, things start falling apart; Ray jokes that he “raw dogs skills” but admits he only wants repeatable workflows in there. The key point: model behavior changes enough across GPT, Claude, and others that even a skill prompt that worked yesterday can break tomorrow.
When they compare their real usage, the consensus is clear: GPT-5.5 Medium and Low are getting the bulk of the work, while Anthropic usage has dropped hard. Ray still loves Composer 2.5 for coding and Codex for full computer control — including reading Mail, navigating apps, and running in the background on Mac with a separate cursor, which he thinks people are massively underrating. They close by talking about Karpathy joining Anthropic and giving advice for newcomers: stop obsessing over code first, look at your daily workflows, automate one repetitive task, and build your “mana” for delegation from there.
Share
Keep Reading
The Weekly Echo. The inbox-shaped summary of what mattered.
New editorials announced here.

Playbook
“Tasteful Skills” argues that the best agent skills are not documentation or best-practice lists.

Playbook
Learn how tasteful prompting helps you move beyond generic AI output by shaping context, style, and judgment from the start.

Playbook
OpenAI shipped /goal for the Codex CLI. It turns a prompt into a persisted, self-continuing contract.