Back to Podcast Digest
Joe Reis··46m

Why Snowflake Bought SelectStar - and What "Data Catalog" Means Now w/ Shinji Kim

TL;DR

  • Snowflake bought SelectStar because the overlap was already massive — Shinji Kim says SelectStar had been a Snowflake premier partner for 3+ years, and 80% of its customers were already Snowflake users, which made the acquisition feel like a natural product and market fit move.

  • “Data catalog” is aging into something closer to a context layer — Kim argues that over the next 3-5 years the category will matter less as a label, because AI systems need continuously updated metadata, business definitions, and relationships that go far beyond a classic BI-era catalog.

  • The real shift is from documenting tables to feeding agents living business context — Kim describes a stack that includes physical metadata, business glossary terms, semantic models, and possibly ontologies or graphs, with agents both consuming and updating that context over time.

  • Semantic models matter now because most data models were built for analytics speed, not business meaning — SelectStar launched semantic model management in early 2025 to let customers define entities, metrics, and logical relationships on top of physical schemas, which Kim sees as critical for AI question-answering.

  • Data catalogs usually fail on adoption, not ingestion — Kim says the common breakdown is that teams buy or build a catalog, populate it, then skip the hard part: curating domains, owners, and taxonomy, and telling the company this is the single pane of glass for data documentation.

  • AI is widening the user base from analysts to business people, but product design has to follow — Kim says SelectStar always aimed at all data consumers, while Snowflake’s Horizon Catalog historically skewed technical; the new direction is explicitly to redesign for business users who want answers without clicking around or writing SQL.

The Breakdown

Why the SelectStar acquisition happened so quickly

Shinji Kim opens with the headline: SelectStar was acquired by Snowflake in December, and just three months later he’s already integrating its tech into Snowflake Horizon Catalog. Joe is surprised by the speed, but Kim says Snowflake has gotten better at integrating startups fast — and in this case the fit was already obvious after years as a partner.

The 80% Snowflake customer overlap that made the deal make sense

Kim says 80% of SelectStar customers were already Snowflake customers, not by design, but because Snowflake exposed metadata cleanly and made integrations unusually easy. He adds that many of those same customers also ran serious Databricks workloads, but Snowflake often remained the “main solution” for BI and analytics because it was so easy to adopt.

Why “data catalog” is starting to sound like an old label

Joe pushes on how the term has shifted from a BI-era concept into something much broader in the AI era. Kim says he’s told his team they may not use the phrase “data catalog” much in a few years, comparing it to how nobody seriously calls Snowflake just a data warehouse anymore.

From metadata repository to AI-ready context layer

Kim traces the old catalog idea back to enterprise inventory systems like IBM, Collibra, and Alation, then says the modern version is becoming a “context layer.” That means not just metadata from systems, but generated summaries, business attributes, documentation, and whatever else agents need to access, update, and maintain continuously.

Semantic layers, ontologies, and the formats humans still need

Joe riffs on semantic layers, taxonomies, and knowledge graphs colliding with the traditional data world, and even says Claude keeps telling him it wants real-time graph-based semantic grounding. Kim basically agrees on the concept, but says the exact format matters more for humans than for AI — whether the context lives in a business glossary, an MD file, or a graph, the point is that people can review it and keep control of what’s true.

What SelectStar learned building agents for real business questions

Kim says SelectStar started with metadata automation — things like column-level lineage, popularity scores, ERD inference, and suggesting top queries tied to official dashboards — then realized customers also needed explicit semantic models. That shift came from a practical problem: end users weren’t just data teams anymore; business users were showing up in the product asking direct business questions, often preferring chat over the UI because they “just want the answer in front of them.”

Why catalogs fail: not freshness alone, but curation and behavior change

Joe points out the classic problem: catalogs get populated, but usage and maintenance fall apart. Kim says the biggest failure mode is adoption — teams buy the tool but don’t curate domains, ownership, taxonomy, or teach the company that this is the source of truth instead of “just ask so-and-so.”

Building AI companies now: faster execution, harder differentiation

When Joe asks how he’d start a company today, Kim says product creation is much faster, but the real challenge is sharper differentiation and go-to-market in a world where everyone can build. He’d focus tightly on the specific value delivered, keep the team small, use agents aggressively, and still rely on human taste and domain intuition to make something that stands out.

What excites him next: context that updates itself

Kim says the next big thing is context management systems that don’t just collect metadata from multiple sources, but also clean conflicts, absorb user feedback, evaluate responses, and update themselves as queries and business usage change. Joe ends on a bigger note: the thing he’s most excited about — and slightly dreading — is the next model inflection point, the kind of leap that makes everyone suddenly say, “something flipped.”