LLM & Generative AI

We design and ship generative-AI systems that hold up under real traffic, real data, and real compliance review.

Where we help

Retrieval-augmented generation (RAG)

Grounded answers over your own knowledge base — with citations, access control, and evaluation. We build the ingestion, chunking, retrieval, and re-ranking stack, then prove quality with offline and online metrics.

Agents & workflows

Tool-using agents that automate multi-step work: ticket triage, document processing, internal copilots. We scope agency tightly, add guardrails, and keep a human in the loop where it matters.

Fine-tuning & adaptation

When prompting is not enough, we fine-tune or adapt open models on your domain data — on infrastructure you control.

Evaluation & guardrails

Every system ships with an eval harness: golden datasets, regression tests, and production monitoring for hallucination, cost, and latency.

Typical outcomes

A support copilot that deflects 40%+ of tier-1 tickets with cited answers.
A document-processing pipeline that cuts manual handling from hours to seconds.
An internal RAG assistant deployed in your EU cloud region, GDPR-clean.

How we build

# A grounded answer is only as good as its evaluation.
# Every nicojahn RAG engagement ships with a regression eval suite.
from nicojahn.eval import GoldenSet, score

results = score(
    system="support-copilot",
    dataset=GoldenSet.load("tier1-tickets-v3"),
    metrics=["faithfulness", "answer_relevance", "citation_accuracy"],
)
assert results.faithfulness > 0.95  # gate the deploy on quality

We default to the most capable models for the task and keep the architecture provider-flexible, so you are never locked to one vendor.

→ Next: ML Engineering & MLOps · Talk to us

Where we help​

Retrieval-augmented generation (RAG)​

Agents & workflows​

Fine-tuning & adaptation​

Evaluation & guardrails​

Typical outcomes​

How we build​