Skip to content

Insights

Insights

What we're learning, writing, and occasionally arguing about. Engineering posts, opinion pieces, and case notes from our work.

RAG evaluation: the tests we run before shipping any LLM feature featured image

AI

RAG evaluation: the tests we run before shipping any LLM feature

A production-grade evaluation harness for retrieval-augmented generation. Golden datasets, LLM-as-judge, retrieval metrics, and regression gates.

Jan 22, 2026 · Anthra AI Team · 5 min read

Read article

Newsletter

Newsletter

Get new engineering essays and practical AI notes in your inbox.