The vector database space has matured. Two years ago it was Pinecone vs. Weaviate and everyone was building their own. Today there are five or six credible options, and "just use pgvector" is increasingly the right answer.
This is how we pick, with actual benchmarks and cost comparisons, based on engagements we've run over the past year.
The short answer
- Default to
pgvectoron Postgres. It's enough for 80% of RAG systems up to ~10M vectors and will reduce operational complexity. - Reach for
QdrantorWeaviate(self-hosted) when you exceed pgvector's performance or feature ceiling. - Reach for
Pineconewhen you want zero ops and have budget. - Reach for
Vespafor hybrid search at massive scale. - Don't build your own. You have better problems to solve.
Full analysis below.
What actually matters in a vector DB
Before comparing, be clear what you're evaluating:
- Query performance at your scale — latency at p95, throughput per second
- Recall — are you getting the actually-closest vectors, or approximations with drops?
- Filter performance — how fast is "vectors near X where tenant_id = Y"?
- Hybrid search — combining vector similarity with keyword (BM25) matching
- Cost — fully-loaded, including compute, storage, and ops
- Operational complexity — how hard is it to run, backup, upgrade?
- Integration — does it fit your existing stack?
- Multi-tenancy — per-tenant isolation, if you need it
- Metadata filtering — pre-filter vs post-filter, complexity of queries
Most "best vector DB" comparisons only measure #1. In practice, #5 and #6 drive the decision.
The contenders
pgvector (Postgres extension)
What it is: An extension adding vector columns and ANN indexes (HNSW and IVFFlat) to Postgres.
Strengths:
- You already run Postgres. Operationally invisible.
- Full SQL for filtering — arbitrary
WHEREclauses on metadata work naturally. - Transactional consistency with the rest of your data.
- Cheap at small to medium scale.
- Mature ecosystem (pgvector is now 4+ years old, in active development).
Weaknesses:
- Performance degrades with very large datasets (>20M vectors gets tricky).
- Index builds can be slow on large tables.
- Less optimized than purpose-built vector DBs for pure vector workloads.
Our take: Start here. You'll probably never leave.
Qdrant (open source + cloud)
What it is: Purpose-built vector DB written in Rust. Open-source with a managed cloud.
Strengths:
- Very fast. Among the best performance/$ ratios.
- Excellent filter performance (payload indexing separate from vector indexes).
- Good multi-tenancy primitives.
- Rich Python/TS clients.
- Self-hostable via Docker, k8s helm chart.
Weaknesses:
- Smaller ecosystem than Pinecone.
- Self-hosted requires real Kubernetes knowledge at scale.
- Managed cloud is newer than Pinecone's offering.
Our take: Best open-source option in 2026. Qdrant Cloud is a solid managed choice.
Weaviate (open source + cloud)
What it is: Go-based vector DB with a schema-first design, built-in hybrid search, and strong module ecosystem.
Strengths:
- Excellent hybrid search out of the box (BM25 + vector).
- Built-in modules for generating embeddings (no separate embedding service needed).
- Good GraphQL and REST APIs.
- Mature managed cloud (WCS).
Weaknesses:
- More opinionated schema model — feels heavier than Qdrant for simple cases.
- Performance is solid but not class-leading.
- Memory-hungry at scale.
Our take: Strong choice if you need hybrid search and like the schema-first approach. Slight edge over Qdrant for hybrid-heavy use cases.
Pinecone (managed only)
What it is: The original purpose-built vector DB cloud service. Fully managed, serverless option available.
Strengths:
- Zero ops. Truly set-and-forget.
- Fast, with consistent latency SLAs.
- Mature ecosystem and SDK.
- Serverless pricing model (v2) is genuinely competitive.
Weaknesses:
- Closed-source. Vendor lock-in is real.
- Historically expensive (v2 serverless helps).
- No self-hosted option.
- Schema flexibility is limited vs. open-source alternatives.
Our take: The right choice if ops time is your bottleneck and you have budget. Otherwise, one of the open-source options with a managed tier wins.
Vespa (open source)
What it is: Yahoo's production search/ranking engine, open-sourced. Handles hybrid search, ML ranking, and vector search at massive scale.
Strengths:
- Battle-tested at billions of vectors, billions of queries per day.
- Native ML ranking integration (tensor evaluation, phased ranking).
- Hybrid search is a first-class citizen.
- Flexible document model.
Weaknesses:
- Steep learning curve. Custom application packages, XML config.
- Operational complexity is real.
- Overkill for most use cases.
Our take: Only pick Vespa if you have serious search needs (complex ranking, true hybrid, >100M vectors) and a team capable of running it.
Honorable mentions
- Milvus / Zilliz — solid, widely used, especially in Asia. Cloud offering is Zilliz. Good choice; we've had more engagements with Qdrant and Weaviate lately so less recent experience here.
- Elasticsearch / OpenSearch — vector support is decent now; worth considering if you already run it for lexical search.
- Chroma — popular in early-stage prototyping but not yet a production choice for us. Watch this space.
- LanceDB — embedded (SQLite-style) vector DB. Interesting for edge / local use.
A concrete benchmark
We ran this benchmark on the MS MARCO dataset (1M passages, 384-dim embeddings) on a single machine (c7g.4xlarge, 16 vCPU, 32GB RAM):
| System | Index build time | p50 query latency | p95 query latency | QPS @ 4 clients |
|---|---|---|---|---|
| pgvector (HNSW) | 48 min | 8 ms | 22 ms | 480 |
| Qdrant | 19 min | 3 ms | 9 ms | 1,200 |
| Weaviate | 26 min | 5 ms | 14 ms | 850 |
| Pinecone (p2.x1) | N/A (managed) | 12 ms | 35 ms | 600 |
| Vespa | 41 min | 4 ms | 11 ms | 1,050 |
All targeted recall@10 ≥ 0.95. Pinecone latency includes network round-trip from a same-region EC2 client.
At this scale (1M vectors), all are viable. pgvector is the slowest but fast enough for most applications.
Benchmarks are sensitive to dataset, query patterns, hardware, and tuning. Run your own on a representative workload before committing. These numbers are directional.
Cost comparison (50M vectors, 768-dim)
At 50M vectors (typical for a medium-sized RAG over full-text document corpus):
| System | Approximate monthly cost |
|---|---|
| pgvector on RDS (db.r6g.4xlarge + 500GB gp3) | $1,200 |
| Qdrant self-hosted (3× m6g.2xlarge + EBS) | $850 |
| Qdrant Cloud (dedicated, similar sizing) | $1,400 |
| Weaviate self-hosted | $900 |
| Weaviate Cloud (serverless) | $1,100 |
| Pinecone (serverless) | $900–1,800 depending on traffic |
| Vespa self-hosted | $950 |
Self-hosted options have lower direct costs but add ops time. Managed options have higher direct cost but save engineering hours.
A DevOps engineer costs $200k+/year loaded. If a managed service saves 4 hours/week of ops, that's $20k/year — pays for most managed offerings at medium scale.
Decision matrix
| If you... | Pick |
|---|---|
| Already run Postgres, have < 10M vectors | pgvector |
| Need best raw performance, self-hostable | Qdrant |
| Need strong hybrid search (vector + BM25) | Weaviate or Vespa |
| Want zero ops, have budget | Pinecone (serverless) |
| Have massive scale (100M+ vectors, complex ranking) | Vespa |
| Need SQL-level filtering flexibility | pgvector |
| Are building a proof-of-concept | pgvector or Qdrant |
| Have a multi-tenant SaaS (per-tenant isolation) | Qdrant or Pinecone (namespaces) |
Don't migrate prematurely
The most common mistake we see: teams migrate from pgvector to a "real" vector DB because it's the trendy architecture. Usually they're at 1M vectors and pgvector is fine.
Signs you should migrate off pgvector:
- p95 vector query latency > 200ms with proper HNSW tuning
- Index build times are blocking development
- You're fighting Postgres planner to get consistent performance
- You're exceeding 50M vectors and growing fast
Signs you should stay on pgvector:
- Queries are fast enough
- You value transactional consistency with the rest of your data
- Ops complexity is your bottleneck
- You're spending more time picking a vector DB than shipping features
Implementation tips regardless of choice
-
Chunk carefully. The right chunk size matters more than the database. 256-512 tokens with 10-20% overlap is a solid default; tune based on your data.
-
Hybrid search almost always helps. Pure vector search misses keyword matches humans expect. BM25 + vector with a re-ranker is the modern pattern.
-
Re-ranking improves quality cheaply. A cross-encoder re-ranker on the top 50 results often gives a bigger quality boost than switching vector DBs.
-
Test on real queries. Synthetic benchmarks lie. Build an evaluation set from real user queries and measure recall@K on it.
-
Metadata filters are where real systems live or die. "Most similar vectors" is rarely enough. "Most similar within this user's documents from last 30 days" is typical. Evaluate filter performance seriously.
Closing
Most teams spend too much time picking a vector DB and not enough time on retrieval quality. The infrastructure choice matters less than the chunking strategy, the embedding model, the re-ranker, and the evaluation harness.
pgvector for as long as you can, then Qdrant or Weaviate when you can't. Pinecone if ops is your bottleneck. Vespa if you're operating at Yahoo scale.
30-day implementation checklist
If you need to move from analysis to execution quickly:
- Baseline current query latency and recall on real traffic.
- Build a representative 100-200 query evaluation set.
- Test at least two contenders against your real filters and tenancy model.
- Compare total cost including operations overhead, not infrastructure only.
- Launch with clear migration rollback and quality gates.
The winning choice is the one your team can operate reliably while meeting product-level latency and quality targets.
Related: RAG evaluation harness, our legal tech RAG case study, and when fine-tuning is worth it.
Related resources
- Capabilities: AI-Native Products and Data Platform
- Case study: LegalTech RAG system
- Deep dive: RAG evaluation tests before shipping
Tags
Anthra AI Team
Engineering Team
Collective posts from the engineers at Anthra AI. We write about what we build.
More posts by Anthra AI TeamShare this article
Get insights like this weekly
Product engineering notes on AI, data, and infrastructure - no fluff.
Previous post
Cloud cost: a checklist before your next AWS bill surprise
Infra
Next post
Event schema design: what every product team gets wrong
Product Analytics