Back to Podcast Digest
AI Engineer1h 8m

Demand-Driven Context: A Methodology for Coherent Knowledge Bases Through Agent Failure

TL;DR

  • Raj’s core argument is that enterprise AI fails less because of model quality and more because institutional knowledge is a mess — he cites McKinsey’s 2025 stat that 88% of companies use AI but only see 6% value creation, and says the missing piece is the “red zone” knowledge living in people’s heads, outdated docs, and duplicated systems.

  • He frames today’s agents like the protagonist in Memento — great at reasoning and code generation, but unable to retain or reconstruct the right domain context without external memory, which makes retrieval alone an incomplete fix.

  • The proposed method is “demand-driven context,” a pull-based workflow where agents learn by failing on real work items — instead of shoving all Confluence/Jira/Slack/GitHub data at an agent upfront, you give it an incident or ticket, let it surface what’s missing, answer those gaps, and have it curate reusable context blocks.

  • Raj’s demo showed the agent turning failure into documentation — in one incident run, it surfaced six previously undocumented entities and then discovered five or six more after a human filled the gaps, with confidence rising across 14 incident cycles from roughly 1.5 to 4.4.

  • He compares the method to TDD for knowledge bases — you don’t build a perfect knowledge system first; you write “failing tests” in the form of tickets and incidents, then incrementally add only the missing context required to make the agent succeed.

  • His practical recommendation is surprisingly simple: store curated context in GitHub first — not because it’s glamorous, but because multiple agents and teams need versioning, PR review, conflict resolution, and a sane persistence layer before anyone ships a shiny SaaS wrapper.

The Breakdown

A sold-out room, IKEA credibility, and the Memento analogy

Raj opens with a warm, slightly nervous energy — joking about the sold-out workshop and the hot room — then introduces himself as a staff software engineer at IKEA, working in a 100+ engineer domain called Deliverance Services. He uses Memento as the framing device: the film’s protagonist can only hold memory for 15 minutes, and Raj says that’s basically today’s agents — brilliant at reasoning, terrible at retaining the institutional knowledge that actually matters.

AI is amazing now, so why aren’t the Jira dashboards moving?

He runs through the now-familiar evolution from prompt engineering to RAG, MCPs, multi-agent systems, and “deep agents,” even joking that Replit can build a full-stack app before your instant noodles are done. But then he lands the punchline: if AI is so good, why aren’t enterprise teams seeing delivery move in Jira or APEX dashboards? He points to McKinsey’s number — 88% adoption, 6% realized value — and says the bottleneck is not capability, it’s context.

The “red zone” problem: tribal knowledge breaks the promise of agents

Raj breaks enterprise work into green, orange, and red knowledge: green is general knowledge LLMs already know, orange is teachable via skills/rules, and red is company-specific institutional knowledge. That red layer is where agents fall down, because enterprise knowledge is fragmented: 20% outdated, 20% unreliable, 10% duplicated, and 40% tribal knowledge living only in people’s heads. His point is blunt: plugging 10, 20, or 100 MCP servers into a monolithic mess won’t fix a monolithic knowledge base.

Why retrieval isn’t enough, even if the industry keeps selling it

He’s especially sharp on this part because he’s speaking from experience: he says he was the person who built lots of MCP servers and hoped that would prove agents could close Jira tickets semi-autonomously. Instead, he found the outputs undeterministic, untested, and often only 10-30% useful, leaving him to do “the data entry job” for the agent. In his telling, the industry has a $9 billion retrieval market, but nobody is coming to your company to clean up your knowledge base for you.

Demand-driven context: make the agent pull knowledge by failing first

His proposed answer is a pull model, not a push model. He compares it to onboarding a new hire: you don’t make them memorize the whole company before assigning work — you give them a task, let them ask questions, and ideally have them improve the documentation as they go. That becomes his methodology: give an agent a work item it will fail on, let it enumerate what it’s missing, then use those gaps to both solve the task and build better context blocks for future tasks.

The demo: incidents as “failing tests” for your knowledge base

Raj’s live demo uses Claude Code with skills, rules, agents, hooks, and a simple file-based stand-in for Confluence, Slack, and GitHub. The key move is that the agent doesn’t just retrieve documents; it scores confidence, identifies undefined terminology and missing business logic, and explicitly lists what was never documented. In one incident example, the knowledge base started with 56 entities, and a single cycle surfaced six undocumented ones, then discovered another five or six once Raj supplied high-level missing context.

From painful manual loops to an automated context gap scanner

He’s candid that manual use is exhausting — no one wants to sit through 15 cycles of agent questioning during real operations. So he shifts to automation: feed archived incidents, Jira tickets, or support cases into the framework, let the agent generate probes, test the knowledge base, classify what’s clean, stale, duplicated, incomplete, or tribal, and then output a prioritized board of what to document first. That turns the agent from a passive consumer into a knowledge manager.

GitHub, meta-models, and the honest limitations

For storage, Raj is opinionated: use GitHub, because multiple teams and agents need PRs, versioning, and conflict resolution more than they need another startup’s polished UI. He also argues for a meta-model — a map of how business processes, systems, APIs, and jargon connect — so the agent can navigate context instead of rummaging through files blindly. In Q&A he’s careful not to oversell it: he says this works best when scoped tightly, manual operation is painful, combining code and docs creates source-of-truth conflicts, and the approach is still early enough that “by tomorrow on YouTube somebody would have already posted something better.”

Share