Vector Databases for RAG: Comparison, Architecture & Selection Guide
Pricing & limits change. Last reviewed 25 June 2026. Verify vector counts, dimensions, and monthly fees on each vendor's pricing page before committing. Latency figures below are planning ranges - benchmark with your embedding model, metadata filters, and hardware.
At a Glance - Vector Database Guide (2026)
- Primary use in 2026: RAG retrieval, semantic search, recommendations - not a replacement for PostgreSQL transactions
- Prototype: Chroma (local) or pgvector (if you already run Postgres)
- SME private assistant: pgvector or Qdrant with metadata RBAC filters
- Hybrid (vector + keyword): Weaviate or Qdrant sparse+dense
- Zero-ops scale: Pinecone managed - higher $/vector, lowest ops burden
- Evaluate with: Precision@k, MRR, p95 latency, cost per 1M queries - not vendor marketing charts
- Build next: private AI assistant with RAG (architecture, cost, security)
Introduction
A vector database guide matters because most production AI features in 2026 - private document chat, semantic product search, support deflection - depend on fast similarity search over embeddings, not keyword matching alone. Vector stores index high-dimensional vectors and return the nearest neighbours to a query embedding in milliseconds.
This page is a selection-focused vector databases comparison for engineering leads: architecture, workload fit, metadata filtering, hybrid search, hosting models, indicative cost, and how to measure retrieval quality. If you are building a company knowledge assistant, pair this guide with our private AI assistant with RAG article for ingestion, permissions, LLM choice, and LKR cost bands.
Also see: AI & ML Sri Lanka · AI chatbots · AI coding tools.
What Vector Databases Do (and When You Do Not Need One)
Embeddings map text, images, or records into vectors like [0.02, -0.41, 0.88, …] (often 384-3072 dimensions). A vector database stores those vectors plus metadata and runs approximate nearest neighbour (ANN) search using indexes such as HNSW or IVF.
| Use case | Vector DB fit |
|---|---|
| RAG over internal PDFs / wikis | Strong - core retrieval layer |
| Semantic product or content search | Strong |
| Recommendations by similarity | Strong at scale |
| Exact SKU / invoice lookup | Poor - use SQL |
| Simple keyword site search (<50K pages) | Optional - Postgres FTS or Elasticsearch may suffice |
| <10K chunks, single Postgres app | pgvector often enough - avoid extra moving parts |
Vector Search Architecture
Vector search architecture for RAG typically spans five stages. The vector database sits at Layer 2; everything around it affects perceived quality as much as the DB brand.
Sources (PDF, Confluence, DB exports)
│
▼
┌───────────────────┐ chunk + embed ┌────────────────────────────┐
│ Ingestion pipeline│ ─────────────────────► │ VECTOR DATABASE │
│ parse · clean │ metadata tags │ vectors · text · ACL fields│
└───────────────────┘ └─────────────┬──────────────┘
│ top-k + filters
User query ──► embed query ──► hybrid? (BM25 + vector) ────────┤
▼
reranker (optional)
│
▼
LLM generation + citations
Architecture decisions that matter
- Chunking: 512-1,024 tokens with 10-20% overlap; store
doc_id,department,classificationon every vector - Metadata pre-filter: Apply RBAC filters before ANN search when the DB supports it (Qdrant, Weaviate, Pinecone, pgvector with WHERE)
- Hybrid search: Combine dense vectors with sparse/BM25 when users type exact product codes, legal citations, or SKU numbers
- Reranking: Retrieve top 20-50, rerank to top 5-10 with a cross-encoder before sending context to the LLM
- Embedding model lock-in: Re-embed the entire corpus if you change models or dimensions
Vector Databases Comparison (2026)
Side-by-side view for best vector database for RAG and semantic search workloads. Ratings are practitioner shorthand (Strong / Good / Limited), not benchmark winners on every dimension.
| Product | Workload fit | Scale (practical) | Metadata filtering | Hybrid search | Hosting | Cost (indicative) |
|---|---|---|---|---|---|---|
| pgvector Postgres extension |
RAG on existing Postgres stack; apps with relational + vector queries | Good to ~500K-1M vectors per tuned instance | Strong (SQL WHERE) | Good with Postgres FTS + RRF fusion | Self-hosted, RDS, Supabase, Neon | LKR 8K-25K/mo infra only |
| Chroma | Local dev, LangChain/LlamaIndex prototypes | Limited - fine for thousands to low millions in-memory | Good (where metadata) | Limited native - often vector-only | Embedded, Docker, Chroma Cloud (beta) | Free self-hosted; cloud varies |
| Qdrant | Production RAG, filtering-heavy, on-prem / residency | Strong - millions to billions with tuning | Strong (payload filters) | Good (sparse vectors + dense) | Self-hosted Docker/K8s; Qdrant Cloud | LKR 15K-50K/mo cloud or VM |
| Weaviate | Multi-tenant SaaS search, multimodal, GraphQL teams | Strong | Strong | Strong (built-in BM25 + vector) | Self-hosted; Weaviate Cloud | LKR 30K-100K/mo managed |
| Pinecone | Managed RAG, fast MVP, minimal ops team | Strong - vendor-managed scaling | Strong | Good (sparse-dense in serverless) | Managed cloud only | LKR 25K-80K+/mo (usage-based) |
| Milvus / Zilliz | Very large corpora, GPU clusters, research scale | Very strong at billion-vector scale | Good | Good (BM25 via integrations) | Self-hosted K8s; Zilliz Cloud | Infra-heavy - enterprise budgets |
LKR bands assume ~LKR 320/USD and include compute/storage, not embedding API fees. OpenAI text-embedding-3-small adds roughly LKR 4-15 per 1M tokens ingested - budget separately.
Quick capability matrix
| Need | First choice | Alternative |
|---|---|---|
| Fastest local prototype | Chroma | pgvector on dev Postgres |
| Already on PostgreSQL | pgvector | Qdrant if you outgrow ANN perf |
| Department-level RBAC on chunks | Qdrant or Weaviate | pgvector + row-level security |
| SKU + natural language queries | Weaviate hybrid | Qdrant sparse+dense |
| No DevOps for vector tier | Pinecone | Weaviate Cloud / Qdrant Cloud |
| Data must stay on your VPC | Qdrant or Weaviate self-hosted | pgvector on private Postgres |
Related reading: Build the full assistant: private AI assistant with RAG (architecture, cost, security). Overview: AI & ML Sri Lanka.
Best Vector Database for RAG - Selection Flow
- Estimate corpus size - chunks after splitting (not raw page count). 5,000 docs × 20 chunks ≈ 100K vectors.
- List compliance constraints - data residency, air-gap, audit logs. Rules out some managed SaaS regions.
- Check filter requirements - per-user, per-department, per-classification. Weak filtering forces over-retrieval and leakage risk.
- Decide hybrid need - if >30% of queries include exact codes or names, plan BM25 + vector from day one.
- Prototype on Chroma or pgvector - run 30-50 golden Q&A pairs before buying managed scale.
- Load-test p95 latency at expected QPS with real filters before production cutover.
Minimal ingestion example (pgvector + TypeScript)
// After chunks exist in Postgres with embedding column (vector(1536))
const { rows } = await db.query(`
SELECT chunk_text, doc_id, title
FROM document_chunks
WHERE department = ANY($2)
ORDER BY embedding <=> $1::vector
LIMIT 8
`, [queryEmbedding, userAllowedDepartments]);
// Pass rows to LLM with cite-instructions
Evaluation Metrics for Vector Search
Pick a golden set of 30-100 question-answer pairs from real users before choosing a vendor forever. Measure retrieval separately from generation quality.
| Metric | What it measures | Target (starting point) |
|---|---|---|
| Precision@k | Share of top-k results that are relevant | Improve trend; compare chunking strategies |
| Recall@k | Share of all relevant docs found in top-k | Critical for RAG - missed chunk = wrong answer |
| MRR (mean reciprocal rank) | How high the first relevant hit ranks | Higher = less noise in LLM context window |
| nDCG@k | Ranking quality when relevance is graded | Use when multiple partial matches exist |
| Query latency p95 | End-to-end retrieve + filter (+ rerank) | Set SLA per product (e.g. <300ms retrieve for chat) |
| Cost per 1M queries | DB + embedding + reranker infra | Track monthly; spikes often from re-embeds |
| Faithfulness (RAG end-to-end) | Answer supported by retrieved chunks | Evaluate in RAG assistant guide |
How to run evals: Label each golden question with the doc IDs that should appear in top-k. Script retrieval across embedding models and DB configs. Tools like Ragas, DeepEval, or a simple spreadsheet work for v1 - perfection is less important than repeatable before/after comparisons.
Common Mistakes
- Wrong embedding model for language - test Sinhala/Tamil/English mixes on a sample set; use multilingual embedders when needed
- Chunks too large or too small - 3,000-token chunks dilute relevance; 50-token chunks lose section context
- No metadata for ACL - vector search returns the whole corpus without department filters
- Skipping hybrid search - pure vector misses exact policy numbers and SKUs
- Choosing Pinecone on day one - fine for scale, expensive to learn chunking on; prototype cheaper first
- No re-ingestion on doc update - stale chunks cause confident wrong answers
Connect to Your Private AI Assistant Build
Choosing a vector database is Layer 2 of five in a production RAG stack. The private assistant guide covers what this page intentionally skips in depth:
| Topic | Covered in |
|---|---|
| Ingestion pipeline, chunking, OCR | RAG private AI assistant |
| Vector DB shortlist + LKR cost bands | RAG guide - Layer 2 + this page (deep comparison) |
| RBAC, permissions, audit logs | RAG guide - Layer 3 & 5 |
| LLM choice, faithfulness eval | RAG guide - Layer 4 |
| Build budget LKR 400K-1.2M SME range | RAG guide - cost section |
Typical path: prototype retrieval with pgvector or Chroma on a golden Q&A set (this guide) → lock stack and permissions in the private AI assistant architecture → deploy with monitoring and faithfulness evals before staff rollout.
RAG and vector search for your product
Hashtag Coders - ingestion pipelines, vector store selection, evaluation harnesses, and private AI assistants.
Frequently Asked Questions
What is the best vector database for RAG in 2026?
There is no universal winner. pgvector fits Postgres-native teams under ~500K chunks. Qdrant fits filtered, self-hosted RAG. Weaviate fits hybrid search. Pinecone fits managed scale with minimal ops. Prototype before committing.
pgvector vs dedicated vector database?
pgvector keeps one database for app data and vectors - simpler ops, good ANN performance at SME scale. Dedicated stores add better filtering ergonomics, hybrid features, and horizontal scaling past roughly 1M+ vectors or high QPS. Migrate when benchmarks on your data show Postgres ANN or write throughput is the bottleneck.
Do I need hybrid search?
Yes when users query exact identifiers (invoice numbers, SKUs, section codes) alongside natural language. Pure vector search can rank semantically similar but wrong documents above the exact match.
How much does a vector database cost for a Sri Lankan SME?
Self-hosted pgvector or Qdrant on a small cloud VM often runs LKR 8,000-50,000/month excluding embedding and LLM API fees. Managed Pinecone/Weaviate tiers commonly land LKR 25,000-100,000/month at early production scale. Full assistant TCO is in the RAG cost section.
Can Hashtag Coders implement vector search for our product?
Yes. We design ingestion, vector store selection, evaluation harnesses, and RAG interfaces for Sri Lankan and international clients. Start with a scoped retrieval POC before a full assistant build.
Conclusion
A solid vector database guide ends with workload fit, not brand loyalty. Match scale, metadata filtering, hybrid search, hosting, and cost to your compliance and team skills - then measure Precision@k and latency on real queries. For the full private knowledge assistant - ingestion through governance - continue to the RAG private AI assistant guide.