Back to Podcast Digest
YC Root Access18m

The Data Layer for the Robot Economy

TL;DR

  • Encord is positioning itself as the data layer for physical AI — Eric and Ulrich describe the company as “AI-native data infrastructure” for robotics, autonomous driving, and other embodied systems, helping teams create, manage, annotate, and evaluate the multimodal data their models depend on.

  • They bet on AI data before ChatGPT made it obvious — founded in the late 2010s and in YC W21, Encord struggled to raise a seed round when fintech, crypto, and remote work were hotter, with one investor even deciding an Icelandic dating company was a bigger market than AI.

  • ChatGPT didn’t just grow demand — it changed customer psychology — Encord’s early product automated annotation with tiny specialized “micro models,” but many AI companies still wanted humans touching their data until ChatGPT proved AI could be trusted in more general workflows.

  • Physical AI has the opposite bottleneck from LLMs: not compute, but data — Ulrich says language models had abundant internet-scale data and could mainly scale with compute, while robotics now has the compute but still needs real-world embodied data to unlock its own “ChatGPT moment.”

  • Encord is expanding from software into real-world data collection — alongside its platform used by 300+ AI teams including Toyota, the company opened a Bay Area R&D facility where robotics companies can bring their robots into instrumented environments to capture pretraining and deployment data.

  • The company’s ambition is Stripe-for-robotics-data — after raising $110 million total, including a new $60 million Series C led by Wellington Management, Encord says it wants every piece of physical AI data to pass through its system, from pretraining to post-deployment observability and exception handling.

The Breakdown

A $60M raise, and a simple pitch: get the right data in, keep the wrong data out

Nico opens with Encord’s newly announced $60 million Series C led by Wellington Management, and the founders frame the company as “AI-native data infrastructure.” Their job is not selling raw data, but being the universal layer where physical AI teams create, manage, annotate, and evaluate the data that determines whether a model actually works in the real world.

The pre-ChatGPT conviction that data was the defensible layer

Ulrich traces the idea back to the end of the 2010s while doing deep learning research at Imperial, when he noticed that among models, compute, and data, data took the longest and felt most defensible. He and Eric, who came from high-frequency trading and big data systems, looked at the status quo — shipping data to the Philippines for labeling and waiting for it to come back — and thought, there has to be a better way.

When investors thought AI was the small market

The funniest and most revealing moment is the fundraising story: in YC W21, AI wasn’t the sexy category, and Encord struggled to raise a seed. One late-stage fund passed because they thought the AI market wouldn’t be big enough, then invested in an Icelandic dating company instead — a perfect snapshot of how early the founders really were.

From computer vision labeling to “micro models” that customers didn’t trust

The first version of Encord focused on automating annotation for computer vision, back when “NLP” and vision were still separate worlds. They built what Ulrich calls “micro models” — tiny specialized models trained from just two or three examples, the opposite of foundation models — but the twist is that even AI companies still didn’t trust AI enough to let it process their data.

ChatGPT became the market education moment

ChatGPT didn’t just create buzz; it made customers believe AI could be reliable across broader workflows. That unlocked Encord’s shift into multimodal systems — not just images or video, but image plus text plus audio — which pushed them deeper into physical AI, where humans already operate multimodally.

Why robotics needs a new kind of data engine

Ulrich lays out the core thesis clearly: LLMs had internet-scale data, so the main gamble was adding compute, but physical AI flips that equation. Robotics now has the compute infrastructure, yet lacks enough high-quality embodied data, which is why Encord sees itself as the company that can unlock the “ChatGPT moment” for robots.

The new offer: real-world data collection and the full data flywheel

Encord says it had mostly stayed away from pretraining and data collection until now, because collecting real-world robotics data is much harder than scraping the internet. So they opened a Bay Area R&D facility where companies can bring robots into custom environments, generate training data, and then keep using Encord through deployment, observability, and exception handling — the full data flywheel, all in one system.

Humans, household robots, and founder advice from the stormy ocean

The founders make a strong case that humans will stay in the loop, especially at the frontier and in exception handling, because a bad chatbot answer is annoying but a hallucinating robot can crash a car or drop a drone from the sky. They point to 300+ AI teams, 150 employees across London and San Francisco, and customers like Toyota and YC company Ato Robotics, then end with two memorable lines: indecision accrues “interest,” and startups should navigate like a rowboat on a stormy ocean — keep your eyes on the island, but don’t try to sail there in a straight line.

Share