OpenRAG: An open-source stack for RAG — Phil Nash
TL;DR
RAG isn’t dead; it’s just messier than the meme suggests — Ash/Tom Nash opens by mocking the “just stuff everything into a million-token context window” argument, saying real businesses have more data, real cost constraints, and very different retrieval needs.
OpenRAG is IBM’s opinionated open-source baseline for modern RAG — it combines Docling for document processing, OpenSearch for hybrid retrieval, and LangFlow for orchestration into a stack meant to be easy to use but still highly customizable.
The hardest part of RAG is still ingestion, especially PDFs — Docling, built from IBM Research Zurich, handles HTML, Word, slides, spreadsheets, audio, video, and PDFs via both a modular pipeline and a Granite Docling 258M vision-language model pipeline.
OpenRAG leans into agentic retrieval instead of one-shot top-K search — rather than embedding the user query once and hoping the answer is in the returned chunks, an agent can decide which searches to run, how many to run, and what to do with the results.
The stack is designed for practical deployment, including offline and air-gapped environments — Nash shows OpenRAG running locally on his laptop with Ollama, Granite 4 3B as the LLM, and Qwen 3 Embedding 0.6B, while noting the whole system can run without external services.
Customization is the real pitch, not just the defaults — from chunk size, OCR, and image descriptions to Google Drive/SharePoint/OneDrive sync, knowledge filters, API access, MCP support, and even adding LangFlow guardrails, OpenRAG is presented as a baseline you’re expected to modify.
The Breakdown
“RAG is dead” gets the opening eye-roll
Nash starts by swatting away the now-routine claim that huge context windows have made RAG obsolete. His point is simple: if every company only had a million tokens of data and loved paying to resend it every query, maybe — but in the real world, RAG is still necessary, just not “solved” in the neat, blog-post way people pretend.
Why RAG stays hard even after the recipe sounds obvious
He runs through the standard recipe — extract text, chunk it, embed it, store it, retrieve top-K — and then immediately undercuts how deceptively clean that sounds. PDFs are painful, chunking is fiddly, embeddings age fast, and the number of possible retrieval tweaks keeps growing: summaries, chunk expansion, cross-encoders, query rewriting, and more.
OpenRAG: three open-source projects stitched into one stack
IBM’s answer is OpenRAG, which bundles Docling, OpenSearch, and LangFlow into what Nash calls a powerful, easy-to-extend baseline. The framing is important: every RAG system will still be different, but this gives teams a strong starting point instead of rebuilding the same plumbing from scratch.
Docling takes on the ugliest part of the pipeline
The ingestion section is really a love letter to document parsing, especially PDFs — “that enemy of all RAG systems.” Docling supports everything from markdown and Word docs to audio, video, and scanned PDFs, with both a modular PDF pipeline and a newer all-in-one Granite Docling 258M VLM; it outputs a structured intermediate format called doc tags that can be turned into markdown, HTML, or JSON and chunked hierarchically.
OpenSearch handles hybrid retrieval, with some thoughtful scaling choices
Once chunks are embedded, OpenRAG stores them in OpenSearch for both vector and keyword search, plus filtering and aggregation. Nash also highlights multi-embedding support for model migrations and the default use of the open-source JVector KNN plugin, which allows live indexing and disk-based scaling so the full index doesn’t have to live in memory.
Agentic retrieval replaces the usual “embed once and hope” workflow
On the generation side, OpenRAG uses an agent to decide what searches to run instead of doing a single nearest-neighbor lookup and tossing top-K chunks into the prompt. Nash’s summary is basically that the model gets tools and instructions, then figures out how to search, iterate, and use results — a more flexible retrieval loop than classic RAG.
The live demo shows a product that’s already pretty usable
Running locally on his laptop, Nash shows the chat UI, where even the default “What is OpenRAG?” query triggers tool use — though the answer lives in the prompt, the agent still fetches the current date “just in case,” which he finds charming. He walks through document upload, folder sync, chunk inspection, and knowledge filters that let users limit chat to specific document subsets.
Under the hood: local models, cloud sync, and LangFlow-level extensibility
In settings, he shows Google OAuth-powered connectors for Google Drive, SharePoint, and OneDrive, plus local Ollama-backed models including Granite 4 3B and Qwen 3 Embedding 0.6B. Then he jumps into LangFlow itself, edits the agent graph, and casually adds guardrails and a parser — a nice concrete example of the project’s real selling point: it’s opinionated, but not boxed in.
The closing pitch: open source all the way down
Nash lands on a pragmatic note: whether RAG is “solved” depends on your data and users, but OpenRAG at version 0.4.0 is ready to try today. The frontend is Next.js, the rest is Python, there’s both API and MCP server access, and he ends by inviting people not just to use it, but to contribute across OpenRAG, Docling, OpenSearch, and LangFlow.