
Playbook
Tasteful Skills
“Tasteful Skills” argues that the best agent skills are not documentation or best-practice lists.
Spotify is rebuilding personalization around a unified generative model — Shivam Verma says the company is moving away from siloed candidate-generation-plus-ranking systems toward a single transformer-style backbone that can power recommendations, search, and steerable experiences across products.
User modeling is the foundation, and Spotify computes it at huge scale — his team generates daily embeddings for more than 1 billion users, turning years of listening behavior into vectors that downstream systems use to recommend tracks, podcasts, and other content.
Spotify now puts users, songs, and podcast episodes in the same embedding space — Verma shows a visualization where his own user embedding sits near the Big Technology podcast, illustrating how cross-content modeling lets the system reason across music and spoken audio together.
Catalog understanding comes from teaching LLMs Spotify’s inventory with semantic IDs — following ideas popularized in work from Google/YouTube, Spotify compresses item embeddings into 4-6 tokens so models can autoregressively predict the next song or episode like they would generate text.
Open-weight LLMs bring world knowledge, while Spotify fine-tuning adds platform knowledge — the company adapts models like Llama and Qwen to blend general understanding with Spotify-specific content, gaining steerability and explainability while managing tradeoffs like catastrophic forgetting.
The endgame is editable, user-controlled personalization — products like AI DJ, prompted playlists, and the new Taste Profile let users talk to Spotify in natural language, inspect what Spotify thinks their taste is, and even tell it what to keep or forget.
Shivam Verma opens by framing Spotify’s next chapter in personalization: less about classic agent workflows, more about “context engineering on the modeling side.” He grounds it in Spotify’s scale — 750 million users, 100 million-plus tracks, roughly 400,000 audiobooks, millions of podcasts, 184 markets — and points to products like AI DJ, prompted playlists, and the new podcast-capable prompt playlists as signs that recommendations are becoming conversational and steerable.
He gives the quick recommender-systems primer: traditional stacks shrink a huge catalog through candidate generation, then rankers, sometimes multiple ones, until you get a final list. The problem is organizational as much as technical — different surfaces like home, playlists, search, podcasts, and ads often end up with separate teams and separate models, which means uneven quality and fragmented features.
Verma’s team, the user representations group in Spotify’s AI Foundation org, builds the embeddings that tell Spotify who you are. He describes older approaches like autoencoders that compress a user’s features into a compact vector and reconstruct them, but says the company is moving toward sequential transformer-based modeling that treats user interactions as context, much closer to how LLMs work.
One of the coolest moments in the talk is a visualization of Spotify’s newer model: tracks in blue, podcast episodes in pink, users in green, all embedded in the same space. Verma points out his own embedding landing near the Big Technology podcast, making the point viscerally: once the model sees enough user context, it can place people and content on the same “hypersphere” and reason across modalities.
Once you understand users, the next challenge is teaching an LLM the catalog itself. Spotify combines its own item embeddings with open-weight models like Llama and Qwen, then uses semantic IDs — compressed tokenized versions of item vectors — so the model can generate the “next item” autoregressively, not as plain text but as the next song or episode.
Verma uses a concrete example: Spotify might represent artists with six semantic-ID tokens, and Ariana Grande and Bruno Mars share the first two because both live in a broad pop neighborhood. Later tokens diverge to capture finer-grained differences, creating a hierarchical structure that lets the model learn both broad similarity and niche distinctions.
The last piece is user control. Taste Profile exposes a text summary of what Spotify thinks you like, then lets you edit it — maybe ask for more Justin Bieber, or tell Spotify to stop leaning on a certain podcast — and feed that signal back into the system.
Because you can’t train directly on every one of Spotify’s 750 million-plus users, Spotify injects personalization by projecting a user embedding into the LLM’s space as a soft token. Verma says this is already showing positive internal results, and if you’re getting next-episode recommendations in Spotify podcasts today, something like this is already in production.
Share
Keep Reading
The Weekly Echo. The inbox-shaped summary of what mattered.
New editorials announced here.

Playbook
“Tasteful Skills” argues that the best agent skills are not documentation or best-practice lists.

Playbook
Learn how tasteful prompting helps you move beyond generic AI output by shaping context, style, and judgment from the start.

Playbook
OpenAI shipped /goal for the Codex CLI. It turns a prompt into a persisted, self-continuing contract.