Back to Podcast Digest
The Artificial Intelligence Show Podcast··1h 28m

Ep. 211: GPT-5.5, ChatGPT Workspace Agents, The Messy Reality of Agents & Google Cloud Next

TL;DR

  • OpenAI’s GPT-5.5 is a direct push into enterprise “real work” — Paul and Mike frame the model’s 1 million-token context window, top benchmark scores, and focus on coding, computer use, knowledge work, and scientific research as OpenAI responding to Claude’s growing traction inside large companies.

  • ChatGPT Workspace Agents may be the first agent product that actually makes sense to non-technical knowledge workers — Paul says the new templates for finance, sales, marketing, support, and chief-of-staff workflows feel like a potential turning point because they bring Codex-style capabilities into a UI regular employees can actually use.

  • The hard part with agents is no longer imagination — it’s governance, pricing, and production reality — the hosts keep returning to messy questions around token budgets, connector sprawl, security, broken workflows, and whether enterprises should centralize agent building rather than let every team go rogue.

  • Google Cloud Next made one thing unmistakable: agents are the strategy now — from Thomas Kurian to Jeff Dean, Google’s message was all-in on agents, but even Jeff Dean warned we’re only seeing “glimpses of the agent economy” and that reliability and trust are still major constraints.

  • Meta’s employee tracking memo shows where computer-use agents are headed in the bluntest possible way — the company is reportedly capturing clicks, keystrokes, mouse movements, and screenshots to train models, and Paul argues the likely purpose is obvious: either performance monitoring or training systems to replace human work.

  • The most practical AI wins still come from repeatable workflows, not sci-fi demos — Mike highlights how SmarterX cut its annual State of AI for Business report from hundreds of hours to roughly a day, while Paul used ChatGPT Deep Research to answer a startup-creation question with 33 citations, 341 searches, and a 23-minute autonomous report.

The Breakdown

GPT-5.5 lands, and OpenAI’s priorities are suddenly very clear

The episode opens with Paul and Mike sounding genuinely fried, which fits the week: new models, agent launches everywhere, and too much happening to skip an episode. Their first big takeaway is that GPT-5.5 is less about a shiny benchmark win and more about OpenAI finally leaning hard into “real work” — the kind Claude has been quietly winning enterprise mindshare for.

Paul says the shift feels obvious after weeks of talking to enterprise leaders, including at Google Cloud Next, where nearly everyone he met was at least experimenting with Claude even if they already had Copilot or Gemini. He ties that to OpenAI’s own language around messy multi-part tasks, memory, tool use, and getting work done without micromanaging prompts.

Sam Altman, Greg Brockman, and the “personal AGI” vision

Paul then pulls in a recent Ashley Vance interview with Sam Altman and Greg Brockman, which he listened to on his flight home. The interesting part wasn’t some giant reveal, but how plainly they described the direction: models that know the context of your life, rely on memory, and eventually make prompting feel almost unnecessary.

Greg’s framing of “personal AGI” stood out most. Instead of one universal AGI moment, the idea is that the model knows you so well — your work, habits, tasks, preferences — that it feels generally intelligent to you, even if the underlying system is still “jagged” and weirdly superhuman in one moment and preschool-level in the next.

Workspace Agents feel like the first serious bridge from AI nerd tools to normal work

The big product moment of the episode is OpenAI’s new Workspace Agents in ChatGPT. Paul describes opening his team account, clicking into the new agent area, and immediately feeling like this might be the thing they’ve been waiting for: prebuilt templates for a chief of staff, data analyst, sales assistant, and customer support agent, all with connections to Slack, Microsoft tools, Google Drive, Salesforce, and more.

What got him excited is how different this feels from developer-first tools like Claude Code or Codex. His core reaction: if these agents actually work reliably, they don’t just automate tasks — they start to reshape hiring plans, org design, and how companies distribute work, because even hesitant employees could probably be taught to build useful agents in a short lab session.

The reality check: agents are powerful, but enterprise deployment is still a mess

From there, the conversation gets more candid. Mike brings up the growing chorus online asking why everyone isn’t “all in” on agents already, and both hosts push back on that framing without being anti-agent at all.

Paul’s week at Google Next, plus examples like Jason Lemkin’s Saster stack, reinforced the same point: the frontier users are getting real advantages, but they’re also dealing with chaos. Token budgets blow up unexpectedly, agents get access to too many connectors, knowledge bases go stale, production systems break, and nobody has a clean answer yet for governance, budget planning, or deciding which vendor stack to commit to.

The pricing problem nobody has solved yet

One of the stickiest parts of the discussion is cost. Paul says he keeps coming back to the idea that token- and credit-based pricing makes very little sense once agents become core infrastructure for knowledge work, especially when companies can’t predict usage and vendors keep changing the rules.

His blunt thought is that this probably ends up looking more like human replacement pricing than software seat pricing. If an agent reliably does the work of a full-time SDR, analyst, or support role, a CEO would happily pay far more than $20 a month for it — which makes today’s pricing models feel temporary and unstable.

Google Cloud Next: all agents, all the time — but still early

Paul’s Google Cloud Next recap confirms that agent talk isn’t hype from the sidelines anymore; it’s the main strategy. Thomas Kurian said the goal is to make Gemini Enterprise the best place to run and manage agents, while Google showcased an enterprise platform bundling agent studio, runtime, memory, governance, and access to 200+ models.

But Paul’s favorite moments came from Google insiders talking honestly about adoption. Ryan Bach discussed moving from “fishing” with lots of experiments to “farming” with a few lighthouse workflows, while Jeff Dean offered the clearest reality check of the week: we’re only seeing early glimpses of the agent economy, reliability is still a real problem, and AGI is likely still one or two major breakthroughs away, with continual learning looking like a key missing piece.

Meta says the quiet part out loud

In the last major discussion, Mike tees up Meta’s leaked memo about installing tracking software on employee computers to capture clicks, keystrokes, mouse movements, and screenshots so AI models can learn how humans use computers. Paul’s reaction is basically: this is ugly, but it’s not surprising — and it’s not even entirely new.

He says the only plausible reasons for this kind of monitoring are performance management or training systems on how people do their jobs, and if it’s the latter, the workforce implications are pretty obvious. It becomes one more example of why AI literacy now includes understanding what kind of company you work for, what its AI intentions are, and whether it sees human talent as something to augment or something to model and replace.