Riley BrownMay 23, 202626m

AI Agent: The Biggest Updates You Missed This Week (Codex, Claude Code, Cursor)

TL;DR

Codex’s new /goal mode turns agents into marathon runners — Riley says OpenAI’s desktop app can now stay aligned to an outcome for hours or even more than a day, citing examples of 4-hour runs and one reported task that lasted 1 day and 14 hours.
Anthropic added multitasking to Claude Code and landed Andrej Karpathy — the new claude agents terminal flow lets Riley fire off five research jobs at once, while Karpathy joining Anthropic is framed as part of a wider talent rush where CTOs from places like Super.com, Workday, and you.com are leaving big roles to become ICs.
The real race is for the AI “super app,” not just the best model — Riley’s thesis is that enterprises want one platform for chat, coding, knowledge work, integrations, automations, and browser/computer control, and he sees Codex, Claude desktop, and increasingly Cursor all converging on that shape.
Google talked AI agents everywhere but still doesn’t have a clear home for them — after Google I/O, Riley calls the AI story a “nothing burger” except for Gemini Spark, which looks promising as a Gemini mode with cloud agents, folders, Drive, NotebookLM, and multimodal tools, but he thinks Google is spreading attention across Gemini, AI Studio, and “anti-gravity.”
Cursor’s new Composer 2.5 model impressed Riley on speed and cost — he demos generating a Linktree-style landing page in seconds, says it feels close to frontier for front-end work, and notes it’s cheap enough that an hour of use might cost less than $1.
Plugin sharing, annotations, and appshots make Codex feel more like an operating layer — teams can now share plugins across a workspace, designers can leave visual annotations that the AI turns into code changes, and the “command-command” appshot flow lets Codex capture context from any app and even type directly into Google Docs.

Summary

Riley launches his new weekly “super app” watch

Riley opens by saying the AI agent world had a huge week: Codex added /goal, Anthropic hired Andrej Karpathy, and Google is rolling out its own answer to OpenClaw-style agents. The whole point of the new series is to cut through the noise and track the updates that actually matter if you want to stay on the frontier of using AI agents.

Claude Code gets multitasking, and Anthropic keeps hoarding talent

Back in the terminal, Riley shows Anthropic’s new claude agents flow, which lets him spin up multiple tasks at once instead of working in one long linear chat. He fires off five separate research jobs and flips between them with the keyboard, calling it a genuinely fun new way to work, especially because it can use custom subagents like his web research specialist.

Then he shifts to Karpathy joining Anthropic, which he treats like a blockbuster transfer. He says people are comparing it to “Ronaldo joining Manchester City,” while also pointing to a broader pattern: top people from Super.com, Workday, you.com, and even Bun are leaving elite roles to join Anthropic as individual contributors — what he calls the era of the “polymathic individual contributor.”

Why Riley still thinks Claude’s product structure is confused

Riley gives Claude’s desktop app credit for getting better fast, especially around browser reliability and ease of use, and says it’s one of the strongest super app candidates. But he’s still frustrated that skills created in Claude’s co-work environment don’t cleanly carry over into Claude Code, calling co-work “one of their biggest mistakes” and basically pleading for “one single app that can do anything.”

Codex adds long-horizon goals, team plugins, and visual editing tools

On the OpenAI side, Riley says /goal is the standout update because it changes Codex from command-following to objective-seeking. His example is absurd on purpose — “create 30 iOS apps” — but the point is that Codex now plans more deeply and stays locked on the end state rather than quitting after the first instruction.

He also highlights plugin sharing across teams, which makes Codex more collaborative: you can build a workflow-specific plugin, share it workspace-wide, and teammates get it in a “shared with you” tab. Then he walks through the new annotation-heavy design mode, explaining that it doesn’t edit code directly like Cursor does, but lets you point at UI elements, say things like “make this bigger,” and have the AI apply those changes after the fact.

Appshots are the most “this is the future” Codex demo

The feature Riley is most animated about is appshots: press both command keys and Codex grabs the current app, takes a screenshot, and opens with full context. He shows how that works in a browser, in Superhuman email, and most dramatically in Google Docs, where Codex uses computer control to type directly into the document while he watches.

Google I/O leaves Riley unconvinced, except for Spark

Riley says Google talked nonstop about AI agents at Google I/O, but from his perspective the AI story was basically a “nothing burger” except for Gemini Spark. His core complaint is simple: if you’re a serious user, you still don’t know which Google product is the agent platform — Gemini, AI Studio, or the “anti-gravity” developer tool.

Spark, though, gets his attention because it appears inside Gemini as a dedicated mode with folder access, cloud computers, connections, Drive, Photos, NotebookLM, guided learning, and multimodal generation. He just doesn’t trust Google to focus, and that leads into a blunt critique: he says DeepMind, despite pioneering AI, currently isn’t best at chat, coding, video, or image generation, and even the newly released Gemini 3.5 Flash sounds to him fast but not compelling enough to matter.

Cursor looks more and more like a real super app contender

Riley closes with strong praise for Cursor’s new Composer 2.5 model, saying it’s extremely fast, very cheap, and especially strong for front-end work. In his demo it spins up a Linktree-style landing page in seconds, then restyles it around Cursor’s own brand just as quickly, which he says would take longer in Codex or Claude.

What matters strategically is that Cursor is no longer just a coding tool in his eyes. With an in-app browser, integrations marketplace, automations, and stated ambitions around coding plus knowledge work, he sees it heading straight toward feature parity with Codex and Claude as a full enterprise “super app.”

Was This Useful?

LinkedIn X Email

Keep Reading

Tune your feedFive quick questions, and the feed ranks what matters to you first.

Or just get notified

The weekly Echo. Signal worth keeping in your inbox.

Every new piece, announced on X.

Follow @alcreon on X

AI Agent: The Biggest Updates You Missed This Week (Codex, Claude Code, Cursor)

Summary

Riley launches his new weekly “super app” watch

Claude Code gets multitasking, and Anthropic keeps hoarding talent

Why Riley still thinks Claude’s product structure is confused

Codex adds long-horizon goals, team plugins, and visual editing tools

Appshots are the most “this is the future” Codex demo

Google I/O leaves Riley unconvinced, except for Spark

Cursor looks more and more like a real super app contender

Was This Useful?

Or just get notified

Read Next

The Retirement Email Isn't a Warning

The Cheapest Model That Passes

Cheap Models, Hard Tasks

Summary

Riley launches his new weekly “super app” watch

Claude Code gets multitasking, and Anthropic keeps hoarding talent

Why Riley still thinks Claude’s product structure is confused

Codex adds long-horizon goals, team plugins, and visual editing tools

Appshots are the most “this is the future” Codex demo

Google I/O leaves Riley unconvinced, except for Spark

Cursor looks more and more like a real super app contender

Was This Useful?

Make Alcreon Yours

Or just get notified

Read Next

The Retirement Email Isn't a Warning

The Cheapest Model That Passes

Cheap Models, Hard Tasks