AI Infrastructure

Vector Databases Compared for Production RAG

Pinecone, Weaviate, pgvector, Qdrant, Milvus — what to pick at each scale band, and how to avoid the "new vector DB" tax when Postgres would have been enough.

TL;DR

Under 1M vectors with Postgres: pgvector. 1–10M with managed preference: Pinecone or Weaviate Cloud. 10M+ self-hosted: Qdrant or Milvus. Hybrid SQL+vector: pgvector wins.

Pick by Vector Count and Operational Maturity

🌱

Band 1: Up to 1M vectors

Winner: pgvector. You almost certainly have Postgres. HNSW indexes hit sub-100ms p95 here. Hybrid keyword+vector queries via tsvector + cosine are unmatched by dedicated DBs. Operational cost: near zero.

🌿

Band 2: 1M–10M vectors

Managed preference: Pinecone (fastest to ship, predictable price) or Weaviate Cloud (richer hybrid search, open-source path out).
Self-hosted preference: Qdrant (simpler ops than Milvus) or pgvector if your team is Postgres-fluent.

🌳

Band 3: 10M–100M vectors

Managed: Pinecone serverless or Weaviate Cloud; cost becomes a real factor. Calculate per-query and per-million-stored.
Self-hosted: Qdrant or Milvus; need a platform team and 24/7 monitoring.

🏔️

Band 4: 100M+ vectors

Managed: Pinecone or Vespa Cloud.
Self-hosted: Milvus is the mature billion-scale option; Qdrant is closing fast. At this scale you are paying SREs to run it; that is a real budget line.

🔍

Hybrid Search (Keyword + Vector)

pgvector + tsvector wins for joins with structured filters. Weaviate has the best dedicated hybrid implementation (BM25 + vector with tunable alpha). Pinecone added hybrid but lags. Qdrant supports it via payload filtering, well-engineered.

📊

The Lock-in Question

High lock-in: Pinecone (proprietary API).
Medium: Weaviate Cloud (open-source export possible).
Low: pgvector, Qdrant, Milvus (open-source, portable).

Why pgvector Is the Right Starting Point

Most enterprise RAG projects start with 50K–500K vectors and stay there. If you already run Postgres, adding pgvector is a 30-minute operation. You get vector search alongside your relational data — the same query can filter by tenant, date range, status, and vector similarity in one round-trip. Dedicated vector DBs require you to denormalize all your filters into the vector record, which becomes painful as your filter logic grows.

The default architecture: start with pgvector. Measure p95 latency monthly. When it crosses your SLA, migrate to a dedicated vector DB — not before. We see ~70% of clients never need to migrate.

Common Mistakes

⚠️

Picking Pinecone for the Demo

Pinecone's free tier and developer experience are excellent. Two years later, at production scale, the bill is large and migration is expensive. If you might exceed 1M vectors, model the cost at that scale before committing.

⚠️

Self-Hosting Without a Platform Team

Milvus, Qdrant, and self-hosted Weaviate need patching, monitoring, scaling, and incident response. Without dedicated platform engineering, managed wins on total cost even if hosting fees look higher.

⚠️

Ignoring Embedding Cost

Vector storage is rarely the dominant cost. Embedding generation (compute, API fees) often dwarfs storage at 10M+ vectors. Cache embeddings aggressively; recompute only on content change.

⚠️

Skipping Hybrid Search

Pure vector retrieval misses exact-match queries (product codes, names, IDs). Hybrid (keyword + vector) is required for 80% of real RAG use cases. Pick a DB that supports it natively.

What to Apply Tomorrow

Start with pgvector if you run Postgres. Measure scale and latency monthly. Migrate to Pinecone (managed, fast ship) or Qdrant (self-hosted, lower lock-in) when you cross 1–10M vectors. Reserve Milvus for 100M+ self-hosted. Build hybrid search from day one. Cache embeddings aggressively.