Insights
Insights
What we're learning, writing, and occasionally arguing about. Engineering posts, opinion pieces, and case notes from our work.
AI
RAG evaluation: the tests we run before shipping any LLM feature
A production-grade evaluation harness for retrieval-augmented generation. Golden datasets, LLM-as-judge, retrieval metrics, and regression gates.
Jan 22, 2026 · Anthra AI Team · 5 min read
Read articleNewsletter
Newsletter
Get new engineering essays and practical AI notes in your inbox.