Back to Podcast Digest
Latent Space··48m

⚡️ How to turn Documents into Knowledge: Graphs in Modern AI — Emil Eifrem, CEO Neo4J

TL;DR

  • Neo4j now pitches itself as turning data into knowledge, not just storing graphs — Emil Eifrem frames the company as a platform for extracting “signal out of the noise,” with the graph as the most knowledge-dense representation at the center.

  • GraphRAG’s real wins are accuracy, developer productivity, and explainability — Eifrem says users like graphs because they can inspect why retrieval happened, unlike opaque vector similarity where an apple and a tennis ball might both score 0.7 for reasons you can’t see.

  • Vector databases are getting squeezed into “search,” not remaining a durable database category — Eifrem, who has argued this for years, says standalone vector DBs still matter at the high end, but every database adding vector indexes means “good enough” is swallowing most of the market.

  • The hottest production pattern is vector search to find starting points, then graph traversal to assemble context — In a customer support flow, Neo4j users might retrieve 100 documents with vector + BM25, then rank them using graph signals like author authority, stars, or PageRank.

  • Enterprise AI has shifted fast from drafting to full automation in the last 3-6 months — Eifrem points to a mortgage lender that first used Neo4j-backed systems to help human agents, then started removing the human in the loop after seeing a 20% conversion lift.

  • Eifrem thinks agents need four data sources to reach “escape velocity”: OLTP, OLAP, memory, and context graphs — His core idea is that operational systems hold the present, warehouses hold the past, memory stores agent state, and context graphs capture the messy “why” behind organizational decisions.

The Breakdown

From graph database to “data into knowledge” platform

Eifrem opens by saying Neo4j is still best known as a graph database, but that undersells what it has become. His preferred framing now is: Neo4j transforms data into knowledge by extracting signal from noise and expressing it in a dense, inspectable form.

Why GraphRAG keeps showing up: accuracy, speed, and visible reasoning

Asked what graph intelligence actually buys AI engineers, Eifrem gives the practical answer: better accuracy, better developer productivity, and better explainability. He contrasts vectors with graphs using a nice image — in vector space, an apple and a tennis ball might both be “0.7” similar and you don’t know why; in a graph, the relationship is explicit, visual, and auditable.

The vector database category is fading into search

On vector DBs, Eifrem is not especially diplomatic: he’s been saying for years that this wasn’t a durable database category. His take is that standalone vector products are being compressed from both sides — databases keep adding vector indexes, and specialists like TurboPuffer increasingly sound more like search platforms than databases.

The real production pattern: vector retrieval first, graph traversal second

He lays out the pattern Neo4j sees most often: use vector search plus BM25 to get initial hits, then traverse the graph to build richer context. In the Apple support example, you might start with 100 relevant docs, then use graph structure to prefer documents tied to higher-authority authors or stronger trust signals before handing context to the LLM.

Life sciences and banking are where this got very real

One big shift since Latent Space first had him on: these systems are now in production. Eifrem highlights life sciences customers like Novo Nordisk, where over 60 million documents and billions of nodes/relationships help researchers navigate patents, internal research, and published papers; he also calls named entity recognition and entity resolution an oddly under-discussed but essential part of making that work.

Banks are using graph-backed AI to train humans — then replace the handoff

The standout banking anecdote is a large mortgage lender with young human “agents,” less than a year of average tenure, and a need to ramp them fast. Their Neo4j-backed system analyzed past conversion paths and helped the bottom quartile improve, reportedly driving a 20% conversion increase; now they’re moving from “draft the message” to “send the message,” which Eifrem sees as the clearest sign trust has crossed a threshold.

Text-to-Cypher quietly flipped from backup plan to default path

A subtle but important change: a year ago, teams built specialized graph tools first and fell back to text-to-Cypher for odd queries. In the last 3-6 months, that inverted — now generic text-to-Cypher leads, and only the edge cases get extracted into custom tools, which matches the host’s broader point that DSLs become much more powerful once people no longer have to learn them manually.

Context graphs complete the enterprise agent stack

Eifrem’s big conceptual contribution here is the “four quadrants” of data agents need: operational systems for the present, warehouses for the past, memory for the agent’s own state, and context graphs for the hidden why behind decisions. His favorite example is a sales discount approved over Slack, email, and a phone call — not cleanly recorded anywhere, but exactly the kind of decision trace an enterprise agent would need if it’s going to act with real institutional judgment.

The hard part isn’t the graph — it’s bootstrapping the organization

He thinks the value of context graphs is obvious; the hard part is instrumentation. Startups can bootstrap them via product adoption, but enterprises have to first map messy, conflicting systems into what Neo4j calls a knowledge layer or semantic layer, where definitions like “customer” finally become good enough for agents to use without confidently hallucinating across inconsistent sources.

A new starter kit, and a final note on AI’s exhilarating mess

Near the end, Eifrem points to a just-released “create context graph” Python tool that can spin up Neo4j-backed context graphs for 22 industries with synthetic or SaaS-connected data. The conversation closes on a very founder/CEO note: this is both the most exciting time to build software and the most unsettling, because software is suddenly malleable again — but only if you remember that someone still has to clean up the vibe-coded mess.