Back to Podcast Digest
The Artificial Intelligence Show Podcast··15m

Nobody Has AI Agents Figured Out - Even the Biggest Companies

TL;DR

  • OpenAI’s updated Codex is a real step toward general-purpose desktop agents — it can now see, click, and type across any app on your Mac in the background, alongside 90+ new plugins, which makes it feel less like a coding helper and more like a co-worker with a mouse.

  • ‘Token maxing’ is emerging as a messy proxy for AI adoption — Writer CEO May Habib frames maximizing token usage as existential, while Uber’s push to rank engineers on AI usage reportedly helped burn through its entire 2026 AI budget in just four months via Claude Code and Cursor.

  • Enterprise AI is surfacing weird second-order problems nobody fully planned for — Microsoft executive Rajesh Jha argues AI agents may eventually need their own software licenses, logins, inboxes, and permissions, effectively turning agents into paid software seats.

  • The security and governance risks are already very real, not hypothetical — the hosts point to Lovable’s confusing ‘public’ project behavior exposing chat histories and code, plus Vercel tracing a compromise back to an employee’s connected AI platform account.

  • Even the biggest labs and tech companies are improvising through this shift — Amazon is reportedly dealing with AI sprawl, Box CEO Aaron Levie says today’s ‘state-of-the-art’ architectures like RAG and orchestration frameworks age out fast, and Google employees are reportedly using Claude Code because Google’s own coding tools lag in product form.

  • The core takeaway for leaders is not ‘move slower forever’ but ‘treat AI as change management, not just tooling’ — the show’s estimate is that only low single-digit percentages of enterprises truly understand and have responsibly integrated generative AI, and under 1% are ready to scale agentic AI safely.

The Breakdown

Codex Gets Hands, and the Stakes Get Bigger

The episode opens with a cluster of stories that all point in the same direction: agents are escaping the demo phase and crashing into enterprise reality. The clearest example is OpenAI’s upgraded Codex, which now gets background computer use on Mac — it can see, click, and type across apps while you keep working — plus 90+ plugins, turning a coding agent into something much closer to a general-purpose digital operator.

‘Token Maxing’ Sounds Smart Until the Bill Arrives

Then comes the slang of the week: “token maxing,” or pushing employees to consume as many AI tokens as possible as a stand-in for adoption. The hosts note that Writer CEO May Habib sees this as existential, but critics call it empty theater, and Uber becomes the cautionary tale: encouraged usage of Claude Code and Cursor, internal leaderboards, and reportedly the entire 2026 AI budget gone in four months.

Agents Aren’t Just Tools — They May Become Employees With Seats

The conversation widens from usage metrics to enterprise plumbing. Microsoft’s Rajesh Jha argues that if agents start acting like real workers, they’ll need software licenses, email inboxes, logins, permissions, and access controls just like humans — a surprisingly concrete reminder that “deploying agents” may also mean buying them seats and governing them like staff.

Aaron Levie’s Warning: Your AI Stack Will Keep Melting

Box CEO Aaron Levie’s thread gets framed as a sanity check for anyone trying to build on AI right now. His point is blunt: companies need to accept that they’ll keep overhauling their architecture because patterns that felt cutting-edge two years ago — RAG, graph RAG, orchestration frameworks — already look stale, while Amazon reportedly lives the downside with duplicate tools, disconnected data, and AI sprawl.

The Lovable Example Shows How Non-Coders Can Walk Into Security Trouble

Paul then zeroes in on a specific incident that rattled him: a viral claim that Lovable projects exposed source code, database credentials, chat histories, and customer data for projects created before November 2025. What made it sticky was the human reaction — his “I’m standing in line for my plane to Vegas reading this like, what?” — followed by Lovable’s clumsy clarification that “public” didn’t mean what users naturally thought it meant, which is exactly the kind of trap nontechnical builders won’t see coming.

Vercel’s Hack Drives Home That Even the Best Teams Get Burned

The Vercel incident lands next as proof this isn’t just about startup sloppiness. According to the CEO’s post, an employee was compromised through a connected AI platform account, which led to access into Vercel’s Google Workspace and broader environments; the attackers moved with “surprising velocity” and an understanding the CEO suspects was “significantly accelerated by AI.”

Nobody Has This Figured Out — Not Enterprises, Not the Labs

That leads to the show’s main thesis: don’t confuse speed with mastery. Paul argues that maybe only low single digits of companies with 250+ employees truly understand and have properly integrated generative AI, and if you narrow that to responsibly scaling agents, it’s “well under 1%,” while Mike predicts a coming “Claude Code moment” for nontechnical knowledge workers that will trigger both excitement and meltdowns.

Even Google Is Playing Catch-Up

The closing note is almost the most revealing: even the model makers are struggling to productize this stuff. The hosts cite Ethan Mollick’s point that Gemini may be excellent through the API but still doesn’t feel competitive with Claude in the app because Anthropic built the better harness around the model — and they mention reporting that Google employees are using Claude Code internally, which says everything about how unfinished this moment still is.