What it actually takes to run RAG in production
· One min read
A retrieval-augmented generation demo takes an afternoon. A RAG system you can put in front of customers takes considerably more — and the difference is almost never the model.
The demo-to-production gap
Most RAG prototypes skip the three things that decide whether the system survives contact with real users:
- Evaluation. Without a golden dataset and regression tests, every prompt tweak is a gamble. We gate deploys on faithfulness and citation accuracy.
- Access control. Retrieval must respect who is allowed to see what. Bolting this on later means re-architecting the index.
- Monitoring. Hallucination rate, latency, and cost drift over time. You want alerts, not customer complaints.
How we approach it
At nicojahn we ship every RAG engagement with an eval harness from day one and deploy into the client's own EU cloud region. The model is provider-flexible; the quality bar is not.
See our LLM & GenAI services or get in touch.