LLM & Generative AI
We design and ship generative-AI systems that hold up under real traffic, real data, and real compliance review.
Where we help
Retrieval-augmented generation (RAG)
Grounded answers over your own knowledge base — with citations, access control, and evaluation. We build the ingestion, chunking, retrieval, and re-ranking stack, then prove quality with offline and online metrics.
Agents & workflows
Tool-using agents that automate multi-step work: ticket triage, document processing, internal copilots. We scope agency tightly, add guardrails, and keep a human in the loop where it matters.
Fine-tuning & adaptation
When prompting is not enough, we fine-tune or adapt open models on your domain data — on infrastructure you control.
Evaluation & guardrails
Every system ships with an eval harness: golden datasets, regression tests, and production monitoring for hallucination, cost, and latency.
Typical outcomes
- A support copilot that deflects 40%+ of tier-1 tickets with cited answers.
- A document-processing pipeline that cuts manual handling from hours to seconds.
- An internal RAG assistant deployed in your EU cloud region, GDPR-clean.
How we build
# A grounded answer is only as good as its evaluation.
# Every nicojahn RAG engagement ships with a regression eval suite.
from nicojahn.eval import GoldenSet, score
results = score(
system="support-copilot",
dataset=GoldenSet.load("tier1-tickets-v3"),
metrics=["faithfulness", "answer_relevance", "citation_accuracy"],
)
assert results.faithfulness > 0.95 # gate the deploy on quality
We default to the most capable models for the task and keep the architecture provider-flexible, so you are never locked to one vendor.
→ Next: ML Engineering & MLOps · Talk to us