Back to Podcast Digest
AI Engineer··1h 3m

Build & deploy AI-powered apps — Paige Bailey, Google DeepMind

TL;DR

  • Paige Bailey’s core pitch is that Google’s stack now spans far more than chat — she rapid-fired through Gemini 3.1 Flash Live, Pro, Flash-Lite, Nano Banana 2, Lyria 3, Genie 3, Gemma 4, VO 3.1 Light, embeddings, and AI Studio’s app runtime to show how one platform now covers text, code, image, audio, video, music, and world generation.

  • AI Studio is positioned as the fastest path from demo to production code — Paige repeatedly used the “Get code” button to turn UI experiments like YouTube video analysis, URL-grounded comparisons, live multimodal chat, and image editing into Python, TypeScript, and Java snippets.

  • Small models plus tools are the practical story, not just frontier models — in compare mode, Gemini 3.1 Flash-Lite used code execution to draw bounding boxes around green Lego bricks correctly on the first try at well under a penny, while Paige stressed that Augment Code reportedly defaulted its agent stack to Gemini 3.1 Pro for performance-plus-cost reasons.

  • Gemini Live stood out as the most visceral demo — Paige screen-shared a Google image search, had the model describe it, switch to Italian, answer a weather question about London, and then narrate what it saw in a Texan accent, all as a live voice conversation with screen and camera input.

  • AI Studio’s new build flow showed the strongest “vibe coding” moment — from a spoken prompt, Paige had it generate a bookshelf-cataloging app with Google login, Firestore persistence, image upload, search grounding, and public sharing, then proved it worked by uploading a bookshelf photo and seeing books persist across sign-out and sign-in.

  • Some of the flashiest research is still not an API, but it hints at where products are going — Genie 3 generated a navigable Lego-rock Texas landscape with a pink goggle-wearing ostrich carrying a rocket blaster, and Paige explicitly clarified it creates pixels, not 3D meshes or Unity-ready assets.

The Breakdown

A scrappy room, a fast-moving stack

Paige opens by joking that only the “valiant few” made it in early thanks to electrical issues elsewhere in the building, then immediately sets the tone: this will be demo-heavy, not slide-heavy. She frames Google DeepMind as shipping at a breakneck pace — Gemini 3.1 Flash Live, Pro, Flash-Lite, Nano Banana 2, Lyria 3, Genie 3, Gemma 4, VO 3.1 Light, multimodal embeddings, and AI Studio’s full-stack runtime all get name-checked in the first few minutes.

AI Studio as the front door to multimodal Gemini

She jumps into AI Studio and walks through the practical knobs: model selection, structured outputs, function calling, code execution, grounding with Google Search or Maps, and URL context. The first live demo is a YouTube dinosaur video; Gemini ingests the URL directly, analyzes the first five minutes, builds a timestamped dinosaur table, and uses Search grounding to attach fun facts — including correctly noting that pteranodons are pterosaurs, not dinosaurs.

Cheap vision wins: Lego bricks, compare mode, and code execution

Next she highlights what she thinks people sleep on: compare mode plus sandboxed Python code execution. Using a simple Lego-bricks image, she asks Gemini 3.1 Flash-Lite and Gemini 3 Flash Preview to draw bounding boxes around the green bricks; both get there, but Flash-Lite is faster and absurdly cheap, which Paige hammers home as the real takeaway for builders.

Retrieval without the setup pain

She then shows URL context with fresh post-cutoff material: one URL about Gemma 4 and another about Genie 3. Gemini compares them, cites sources inline, and Paige uses this to make a broader point — a lot of teams just need “poor man’s retrieval,” not a full vector database, especially when working with public web content.

Gemini Live feels like the future because it talks back

The energy spikes when she switches to Gemini Live. Sharing her screen, she asks the model what it sees, then has it repeat itself in Italian and answer what the weather is like in London, before pushing it into a poem delivered in a Texas accent — which gets a laugh and a homesick reaction from her.

The build demo: from spoken prompt to working app

Then comes the big app-builder moment. Paige speaks a prompt for an app that lets users upload a bookshelf photo, identify titles and authors from the spines, use Google Search to enrich metadata, require Google login, and persist everything to a database; AI Studio starts scaffolding it with authentication and Firestore while she lets “the model cook in the background.”

Genie 3 and the joyfully weird world-model moment

While that app compiles, she detours into Project Genie. She prompts a Big Bend National Park scene made of Lego rocks with a quadruple rainbow and a pink ostrich wearing goggles and carrying a rocket blaster, then navigates the resulting world live, emphasizing that there’s no Unity or Unreal underneath — just dynamically generated pixels frame by frame.

Media stack tour: SVGs, Nano Banana, VO, Lyria, and the finished bookshelf app

The final stretch is a whirlwind: Gemini produces a rough SVG from the Lego photo, Nano Banana 2 composits a dog into a nature scene holding a Celsius can, VO 3.1 Light turns a prompt into Warriors-themed vegan food truck footage, and Lyria 3 generates a Spanish-language song about an electronic vegan basketball food truck with Legos. Then AI Studio’s bookshelf app finishes, Paige signs in with Google, uploads a shelf photo, watches it infer titles and authors, confirms persistence after signing out and back in, and ends by nodding to Gemma 4, Pupper robotics, AR use cases, and a likely future AI Studio app from the team.