Do I need a dedicated vector database at all?

For under ~1M vectors and existing Postgres infrastructure, pgvector is usually enough. Dedicated vector DBs (Pinecone, Weaviate, Qdrant, Milvus) become worth it at multi-million vectors, sub-50ms latency targets, or hybrid keyword+vector search at scale.

Is Pinecone or Weaviate better?

Pinecone is managed-only, fastest to ship, predictable pricing but vendor lock-in. Weaviate is open-source with managed option, richer hybrid search, more operational responsibility if self-hosted. Pinecone for fast time-to-market; Weaviate for control and customization.

Is pgvector production-ready?

Yes, for moderate scale. pgvector 0.7+ with HNSW indexes handles 1–10M vectors with sub-100ms p95 on right-sized hardware. Above that, latency degrades and dedicated vector DBs win. Hybrid SQL+vector queries are pgvector's killer feature and where it beats every dedicated DB.

What about self-hosted at very large scale?

Milvus and Qdrant lead at 100M+ vectors. Milvus is more mature for billion-scale, Qdrant is simpler to operate and has better filtering. Both require a competent platform team; budget for the SRE time, not just the infrastructure.

AI Infrastructure

Vector Databases Compared for Production RAG

Pinecone, Weaviate, pgvector, Qdrant, Milvus - what to pick at each scale band, and how to avoid the "new vector DB" tax when Postgres would have been enough.

Discuss Your RAG Infrastructure See the Scale Bands

⚡

TL;DR

Under 1M vectors with Postgres: pgvector. 1–10M with managed preference: Pinecone or Weaviate Cloud. 10M+ self-hosted: Qdrant or Milvus. Hybrid SQL+vector: pgvector wins.

Scale Bands

Pick by Vector Count and Operational Maturity

🌱

Band 1: Up to 1M vectors

Winner: pgvector. You almost certainly have Postgres. HNSW indexes hit sub-100ms p95 here. Hybrid keyword+vector queries via tsvector + cosine are unmatched by dedicated DBs. Operational cost: near zero.

🌿

Band 2: 1M–10M vectors

Managed preference: Pinecone (fastest to ship, predictable price) or Weaviate Cloud (richer hybrid search, open-source path out).
Self-hosted preference: Qdrant (simpler ops than Milvus) or pgvector if your team is Postgres-fluent.

🌳

Band 3: 10M–100M vectors

Managed: Pinecone serverless or Weaviate Cloud; cost becomes a real factor. Calculate per-query and per-million-stored.
Self-hosted: Qdrant or Milvus; need a platform team and 24/7 monitoring.

🏔️

Band 4: 100M+ vectors

Managed: Pinecone or Vespa Cloud.
Self-hosted: Milvus is the mature billion-scale option; Qdrant is closing fast. At this scale you are paying SREs to run it; that is a real budget line.

🔍

Hybrid Search (Keyword + Vector)

pgvector + tsvector wins for joins with structured filters. Weaviate has the best dedicated hybrid implementation (BM25 + vector with tunable alpha). Pinecone added hybrid but lags. Qdrant supports it via payload filtering, well-engineered.

📊

The Lock-in Question

High lock-in: Pinecone (proprietary API).
Medium: Weaviate Cloud (open-source export possible).
Low: pgvector, Qdrant, Milvus (open-source, portable).

The Postgres Default

Why pgvector Is the Right Starting Point

Most enterprise RAG projects start with 50K–500K vectors and stay there. If you already run Postgres, adding pgvector is a 30-minute operation. You get vector search alongside your relational data - the same query can filter by tenant, date range, status, and vector similarity in one round-trip. Dedicated vector DBs require you to denormalize all your filters into the vector record, which becomes painful as your filter logic grows.

The default architecture: start with pgvector. Measure p95 latency monthly. When it crosses your SLA, migrate to a dedicated vector DB - not before. We see ~70% of clients never need to migrate.

Pitfalls

Common Mistakes

⚠️

Picking Pinecone for the Demo

Pinecone's free tier and developer experience are excellent. Two years later, at production scale, the bill is large and migration is expensive. If you might exceed 1M vectors, model the cost at that scale before committing.

⚠️

Self-Hosting Without a Platform Team

Milvus, Qdrant, and self-hosted Weaviate need patching, monitoring, scaling, and incident response. Without dedicated platform engineering, managed wins on total cost even if hosting fees look higher.

⚠️

Ignoring Embedding Cost

Vector storage is rarely the dominant cost. Embedding generation (compute, API fees) often dwarfs storage at 10M+ vectors. Cache embeddings aggressively; recompute only on content change.

⚠️

Skipping Hybrid Search

Pure vector retrieval misses exact-match queries (product codes, names, IDs). Hybrid (keyword + vector) is required for 80% of real RAG use cases. Pick a DB that supports it natively.

Takeaway

What to Apply Tomorrow

Start with pgvector if you run Postgres. Measure scale and latency monthly. Migrate to Pinecone (managed, fast ship) or Qdrant (self-hosted, lower lock-in) when you cross 1–10M vectors. Reserve Milvus for 100M+ self-hosted. Build hybrid search from day one. Cache embeddings aggressively.

Read the Slavin AI Methodology Discuss Your Stack