Insights

What we're learning, writing, and occasionally arguing about. Engineering posts, opinion pieces, and case notes from our work.

RAG evaluation: the tests we run before shipping any LLM feature

A production-grade evaluation harness for retrieval-augmented generation. Golden datasets, LLM-as-judge, retrieval metrics, and regression gates.

Jan 22, 2026 · Anthra AI Team · 5 min read

Newsletter

Get new engineering essays and practical AI notes in your inbox.