AI EngineerApril 8, 202615m

OpenRAG: An open-source stack for RAG — Phil Nash

TL;DR

RAG isn’t dead; it’s just messier than the meme suggests — Ash/Tom Nash opens by mocking the “just stuff everything into a million-token context window” argument, saying real businesses have more data, real cost constraints, and very different retrieval needs.
OpenRAG is IBM’s opinionated open-source baseline for modern RAG — it combines Docling for document processing, OpenSearch for hybrid retrieval, and LangFlow for orchestration into a stack meant to be easy to use but still highly customizable.
The hardest part of RAG is still ingestion, especially PDFs — Docling, built from IBM Research Zurich, handles HTML, Word, slides, spreadsheets, audio, video, and PDFs via both a modular pipeline and a Granite Docling 258M vision-language model pipeline.
OpenRAG leans into agentic retrieval instead of one-shot top-K search — rather than embedding the user query once and hoping the answer is in the returned chunks, an agent can decide which searches to run, how many to run, and what to do with the results.
The stack is designed for practical deployment, including offline and air-gapped environments — Nash shows OpenRAG running locally on his laptop with Ollama, Granite 4 3B as the LLM, and Qwen 3 Embedding 0.6B, while noting the whole system can run without external services.
Customization is the real pitch, not just the defaults — from chunk size, OCR, and image descriptions to Google Drive/SharePoint/OneDrive sync, knowledge filters, API access, MCP support, and even adding LangFlow guardrails, OpenRAG is presented as a baseline you’re expected to modify.

Summary

“RAG is dead” gets the opening eye-roll

Nash starts by swatting away the now-routine claim that huge context windows have made RAG obsolete. His point is simple: if every company only had a million tokens of data and loved paying to resend it every query, maybe — but in the real world, RAG is still necessary, just not “solved” in the neat, blog-post way people pretend.

Why RAG stays hard even after the recipe sounds obvious

He runs through the standard recipe — extract text, chunk it, embed it, store it, retrieve top-K — and then immediately undercuts how deceptively clean that sounds. PDFs are painful, chunking is fiddly, embeddings age fast, and the number of possible retrieval tweaks keeps growing: summaries, chunk expansion, cross-encoders, query rewriting, and more.

OpenRAG: three open-source projects stitched into one stack

IBM’s answer is OpenRAG, which bundles Docling, OpenSearch, and LangFlow into what Nash calls a powerful, easy-to-extend baseline. The framing is important: every RAG system will still be different, but this gives teams a strong starting point instead of rebuilding the same plumbing from scratch.

Docling takes on the ugliest part of the pipeline

The ingestion section is really a love letter to document parsing, especially PDFs — “that enemy of all RAG systems.” Docling supports everything from markdown and Word docs to audio, video, and scanned PDFs, with both a modular PDF pipeline and a newer all-in-one Granite Docling 258M VLM; it outputs a structured intermediate format called doc tags that can be turned into markdown, HTML, or JSON and chunked hierarchically.

OpenSearch handles hybrid retrieval, with some thoughtful scaling choices

Once chunks are embedded, OpenRAG stores them in OpenSearch for both vector and keyword search, plus filtering and aggregation. Nash also highlights multi-embedding support for model migrations and the default use of the open-source JVector KNN plugin, which allows live indexing and disk-based scaling so the full index doesn’t have to live in memory.

Agentic retrieval replaces the usual “embed once and hope” workflow

On the generation side, OpenRAG uses an agent to decide what searches to run instead of doing a single nearest-neighbor lookup and tossing top-K chunks into the prompt. Nash’s summary is basically that the model gets tools and instructions, then figures out how to search, iterate, and use results — a more flexible retrieval loop than classic RAG.

The live demo shows a product that’s already pretty usable

Running locally on his laptop, Nash shows the chat UI, where even the default “What is OpenRAG?” query triggers tool use — though the answer lives in the prompt, the agent still fetches the current date “just in case,” which he finds charming. He walks through document upload, folder sync, chunk inspection, and knowledge filters that let users limit chat to specific document subsets.

Under the hood: local models, cloud sync, and LangFlow-level extensibility

In settings, he shows Google OAuth-powered connectors for Google Drive, SharePoint, and OneDrive, plus local Ollama-backed models including Granite 4 3B and Qwen 3 Embedding 0.6B. Then he jumps into LangFlow itself, edits the agent graph, and casually adds guardrails and a parser — a nice concrete example of the project’s real selling point: it’s opinionated, but not boxed in.

The closing pitch: open source all the way down

Nash lands on a pragmatic note: whether RAG is “solved” depends on your data and users, but OpenRAG at version 0.4.0 is ready to try today. The frontend is Next.js, the rest is Python, there’s both API and MCP server access, and he ends by inviting people not just to use it, but to contribute across OpenRAG, Docling, OpenSearch, and LangFlow.

Was This Useful?

LinkedIn X Email

Keep Reading

Tune your feedFive quick questions, and the feed ranks what matters to you first.

Or just get notified

The weekly Echo. Signal worth keeping in your inbox.

Every new piece, announced on X.

Follow @alcreon on X

OpenRAG: An open-source stack for RAG — Phil Nash

Summary

“RAG is dead” gets the opening eye-roll

Why RAG stays hard even after the recipe sounds obvious

OpenRAG: three open-source projects stitched into one stack

Docling takes on the ugliest part of the pipeline

OpenSearch handles hybrid retrieval, with some thoughtful scaling choices

Agentic retrieval replaces the usual “embed once and hope” workflow

The live demo shows a product that’s already pretty usable

Under the hood: local models, cloud sync, and LangFlow-level extensibility

The closing pitch: open source all the way down

Was This Useful?

Or just get notified

Read Next

The Retirement Email Isn't a Warning

The Cheapest Model That Passes

Cheap Models, Hard Tasks

Summary

“RAG is dead” gets the opening eye-roll

Why RAG stays hard even after the recipe sounds obvious

OpenRAG: three open-source projects stitched into one stack

Docling takes on the ugliest part of the pipeline

OpenSearch handles hybrid retrieval, with some thoughtful scaling choices

Agentic retrieval replaces the usual “embed once and hope” workflow

The live demo shows a product that’s already pretty usable

Under the hood: local models, cloud sync, and LangFlow-level extensibility

The closing pitch: open source all the way down

Was This Useful?

Make Alcreon Yours

Or just get notified

Read Next

The Retirement Email Isn't a Warning

The Cheapest Model That Passes

Cheap Models, Hard Tasks