Back to Podcast Digest
Joe Reis··46m

The Future of Open Data Infrastructure with George Fraser (CEO of Fivetran)

TL;DR

  • Open data infrastructure is George Fraser’s next act after the modern data stack — he argues every meaningful company should keep a continuously updated copy of all business data in a format it truly controls, specifically Iceberg on cloud object storage.

  • The real fight is no longer just analytics — it’s whether vendors let customers access their own data for AI — Fraser says some vendors have recently gotten “squirly,” adding egress fees, extra SKUs, or AI-gated access because they fear AI disruption and stock-price pressure.

  • Fraser thinks “data gravity” is mostly overrated marketing — Fivetran’s view is that if replication is done correctly with change data capture instead of nightly full reloads, egress costs are usually minimal and the classic lock-in argument falls apart.

  • For enterprise AI, the winning pattern is not random app connectors — it’s curated shared context on top of the warehouse/lake you already have — Fraser says tools like Claude connectors work for simple personal sources like Gmail, but business systems like Salesforce, Jira, and billing data need company-specific curation and business rules.

  • He sees AI and BI collapsing into one question-answering layer over mixed data — his example was querying Gong call transcripts from the last week over 15 minutes, where the filtering is classic SQL and the “summarize themes” part is semantic AI, all on the same underlying model.

  • If Fraser started Fivetran today, he’d delay hiring engineers and lean hard on coding agents — he said it would be “me and Mel and a zillion coding agents up to like $5 million in revenue,” which tells you how seriously he takes agentic software development.

The Breakdown

From modern data stack to “postmodern” open infrastructure

Fraser starts by half-correcting Joe: he doesn’t claim to have coined “modern data stack,” though he credits co-founder Taylor Brown for aggressively popularizing it back in 2015. His bigger point is that the modern data stack already won, and the next idea is “open data infrastructure” — basically the postmodern version, where you keep your own canonical copy of business data in a format you control.

Why vendors are suddenly nervous about your data

He says open data means two things: using interoperable tech like Iceberg on cloud storage, and having the practical right to continuously replicate your data out of SaaS apps and managed systems. The urgency, in his telling, is new: over the last six months, vendors spooked by AI have started talking about tolls, egress charges, extra product SKUs, and forcing customers through the vendor’s own AI tools.

George names names and goes after “data gravity”

Fraser is in no mood to be polite here — he literally says he can be counted on to “name and shame” anyone engaged in “data moats.” He argues “data gravity” is one of the most overrated ideas in data: cloud egress only looks terrifying when pipelines are badly designed, and if you use CDC instead of copying the entire dataset every night, the amount moved is far smaller than people think.

Why SaaS vendors never really own enough context to win

On the SaaS side, he says companies like Salesforce, Workday, and SAP love the strategy-slide fantasy that managing customer data gives them a moat. In practice, it breaks because no vendor has all of a customer’s data — maybe 10% at best — so both the positive case fails and the negative case kicks in: customers hate being blocked from replicating their own information.

The customer playbook: replicate, negotiate, lawyer up if needed

Fraser’s advice gets very concrete here: insist on a replica of your own data, don’t query the source system every time, and write replication rights into big SaaS contracts. He also warns people not to blindly accept a vendor’s interpretation of its own terms of service, because legal language is often counterintuitive and vendors frequently overstate what customers are prohibited from doing.

Why Iceberg matters, and why AI should sit on your existing data stack

When Joe asks for a blueprint, Fraser points to opendatainfrastructure.org, which includes a vendor leaderboard and contract language suggestions. Technically, he’s pretty prescriptive: Iceberg is the best common format because Snowflake, Databricks, DuckDB, Python, Microsoft Fabric, and AWS Glue can all read it, making it the safest long-term bet for unknown future AI workflows.

Shared context beats one-off AI connectors

This is where Fraser gets especially sharp about enterprise AI. He says Claude- or ChatGPT-style connectors are fine for simple personal domains like Gmail or Google Drive, but business data is shared, permissioned, and messy — think Salesforce, billing, Jira, Zendesk — so the real work is curation, not just extraction. Most companies already do that curation for BI, so his thesis is: reuse the same data warehouse or lake foundations for AI.

Mixed queries, coding agents, and the very database-shaped future

Fraser describes asking Fivetran’s internal AI to sample Gong transcripts from customer calls over 15 minutes in the last week and summarize themes — a perfect example of analytics and semantics blending into one workflow. He also talks like someone personally re-energized by agents: he’s back to prototyping code, uses AI during meetings, built a bot to run his USTA tennis team, and came away convinced that agents need databases, shared state, and eventually a more unified corporate identity model.

Where Fivetran is headed next

By the end, the product roadmap comes into focus: more data lake adoption, transparent migration of existing destinations into Iceberg-backed lakehouse setups, and big ambitions after the pending dbt merger closes. Fraser basically admits Fivetran is becoming a different kind of data platform — one built around interoperability rather than lock-in, with AI, dbt, and open formats all sitting on the same foundation.