Back to Podcast Digest
Joe Reis56m

From DataFrame to Knowledge Graph in 4 lines of Python w/ Veronika Heimsbakk

TL;DR

  • Four lines gets you a working graph: Veronika shows Polars + maplib + Model + map_default turning a CSV into a knowledge graph immediately, then jokes that the talk could end there because the basic promise is already fulfilled.

  • The key idea is triples plus global IDs: Every row becomes explicit subject-predicate-object triples, and shared identifiers like band names let data from bands, vinyl inventory, and festival lineups merge into one graph.

  • SPARQL looks familiar if you know SQL: Veronika frames SPARQL as SQL-like but graph-native, with no explicit joins because the graph's identifiers already define the connections, and SELECT queries return dataframes in maplib.

  • Strings become things with meaning: She upgrades raw genre strings into SKOS concepts, so 'blackgaze' is no longer just text but a concept with broader and narrower relationships, preferred labels, and alternate labels.

  • The demo stays practical, not academic: The graph combines 16 bands, a fictional cheap record store, and fictional festival lineups into concrete queries like 'which Roadburn bands have albums in stock and at what price?'

  • Scale was a direct audience concern: Asked how this holds up in production, Veronika says RDF scaling problems are old news, and that maplib can handle up to about 3 billion triples on disk before breaking on a regular Mac.

The Breakdown

A 16-row music dataset turns into a queryable knowledge graph in four lines of Python, then grows into a 241-triple graph that can join bands, albums, festivals, Wikidata, and Discogs without hand-written SQL joins. Veronika Heimsbakk makes the case that knowledge graphs are much less intimidating than they sound, especially once you see Pandas or Polars dataframes map straight into RDF and come back out as dataframes again.

Was This Useful?

Share