📚
1. Knowledge Freshness
Stable corpus, rare updates (1–2): fine-tune. Daily/weekly updates, regulated domain with audit trail (4–5): RAG. Fine-tunes drift fast when underlying facts change.
🎯
2. Output Structure
Highly specific format, tone, persona (1–2): fine-tune. Free-form answers grounded in source (4–5): RAG. JSON-mode and structured outputs blur this line in 2026.
🔍
3. Citation Requirement
None or aesthetic only (1–2): fine-tune. Hard regulatory requirement (medical, legal, finance) (4–5): RAG. RAG returns source chunks; fine-tuned models cannot reliably cite.
💰
4. Query Volume
10M+ queries/year, narrow domain (1–2): fine-tune amortizes. Sub-1M queries or broad domain (4–5): RAG. Crossover depends on context length; calculate before committing.
⚡
5. Latency Budget
Sub-300ms p95 needed (1–2): fine-tune (single LLM call). 500–2000ms acceptable (4–5): RAG (embed + retrieve + generate adds 200–800ms).
🔐
6. Data Sensitivity
Public domain knowledge (1–2): either works. Confidential corpus that cannot leave VPC (4–5): RAG with self-hosted embeddings, or fine-tune on a self-hosted base model.
📊
Decision Bands
Sum 6–14: Fine-tune (or prompt-engineer first).
Sum 15–22: HYBRID — fine-tune for style, RAG for facts.
Sum 23–30: RAG only.