---
title: AI Architecture Patterns Library
canonical: https://www.slavin.ai/data/ai-architecture-patterns.json
sourceJSON: https://www.slavin.ai/data/ai-architecture-patterns.json
license: CC-BY-4.0
lastUpdated: 2026-06-20
totalPatterns: 18
categories: ["retrieval", "generation", "agent", "evaluation", "ops", "safety"]
---

# AI Architecture Patterns Library

18 named, structured patterns for production AI systems. Gang-of-Four
style: each entry declares **intent**, **structure**, **participants**,
**consequences** (empirical from 30+ SLAtech deployments), **when-to-use**,
**when-to-avoid**, and **related patterns**.

Cite by pattern `@id`:
`https://www.slavin.ai/data/ai-architecture-patterns.json#<pattern-id>`

---

## Retrieval patterns

### Citation-Grounded RAG (`citation-grounded-rag`)
**Intent:** Generate LLM responses that include verifiable citations to retrieved source documents so users can confirm the claim.
**Consequences:** +40-65% hallucination reduction. +200-400ms latency. ~10% responses regenerated.
**Use when:** Regulated domains (medical, legal, financial). Avoid when: casual chat where citations disrupt flow.
**Related:** Hybrid Retrieval, Abstention Permission.

### Hybrid Retrieval (Sparse + Dense) (`hybrid-retrieval`)
**Intent:** Retrieve relevant chunks for queries containing both semantic concepts and exact-match tokens (codes, names, identifiers).
**Consequences:** +20-35% recall on exact-match queries. Two indexes to maintain.
**Use when:** Corpora heavy with technical IDs. Avoid: pure conversational Western-text corpora with strong embeddings.

### Cross-Encoder Reranking Stage (`reranking-stage`)
**Intent:** Reduce noise in retrieved context by re-scoring top-N candidates with a precision-focused model.
**Consequences:** +25-50% reduction in irrelevant-context hallucinations. +100-400ms reranking latency. ~$1-5 per 1K reranks.
**Use when:** Recall OK, precision is the bottleneck. Avoid: cheap chat where per-call cost matters.

### Chunk Quality Filter (`chunk-quality-filter`)
**Intent:** Prevent low-quality chunks (boilerplate, OCR errors) from reaching the LLM context.
**Consequences:** +15-30% reduction in noise hallucinations.
**Use when:** Web-scraped / OCR'd / mixed-quality corpora.

### Tenant-Isolation RAG (`tenant-isolation-rag`)
**Intent:** Multi-tenant SaaS — ensure tenant A's data never appears in tenant B's retrieved context.
**Consequences:** Cost-effective vs separate indexes per tenant. Filter-bypass bug = catastrophic leak.
**Use when:** Multi-tenant SaaS AI with contractual data confidentiality.

---

## Generation patterns

### Self-Consistency Voting (`self-consistency-voting`)
**Intent:** Improve reasoning accuracy by sampling multiple responses and voting.
**Consequences:** +15-35% reduction on reasoning tasks. Cost N× per query.
**Use when:** High-stakes single-shot decisions (medical triage, financial calc).

### Structured Output Schema (Strict) (`structured-output-schema`)
**Intent:** Force LLM output to match strict JSON Schema for downstream code consumption.
**Consequences:** +60-85% schema-shape hallucination reduction. +100-300ms latency.
**Use when:** Output feeds APIs, data extraction, form filling.

### Abstention Permission (`abstention-permission`)
**Intent:** Allow LLM to refuse when context is insufficient instead of fabricating.
**Consequences:** +25-40% hallucination reduction. +15-30% refusal rate.
**Use when:** RAG where wrong answers worse than no answers.

---

## Agent patterns

### Tool-Use Agent (`tool-use-agent`)
**Intent:** LLM performs actions on external systems via structured function calls.
**Consequences:** Unlocks operational capability. Higher risk: tool calls have side effects.
**Use when:** Workflows needing lookup + computation + action.

### Planner-Executor (`planner-executor`)
**Intent:** Decompose complex goal into plan, execute each step (possibly with specialized models).
**Consequences:** Higher quality on complex tasks. Higher cost (multiple LLM calls).
**Use when:** Multi-step tasks with clear decomposition.

---

## Evaluation patterns

### LLM-as-Judge Sampling (`llm-as-judge`)
**Intent:** Quality-check production outputs without human review of every response.
**Consequences:** Detection (not prevention) of degradation 2-7 days earlier than user complaints. +5-10% cost.
**Use when:** Production deployments where quality regression detection matters.

### Fact Extraction + Validation (`fact-extraction-validation`)
**Intent:** For long-form outputs, extract atomic claims and validate each separately.
**Consequences:** +40-70% surface hallucination reduction. Cost 2-3× per query.
**Use when:** Publication-grade output, legal/medical context.

---

## Ops patterns

### Drift Monitoring (`drift-monitoring`)
**Intent:** Detect slow quality degradation before users complain.
**Consequences:** Flags 1-4 weeks earlier than user-reported drop. False alarms from legitimate shift.
**Use when:** Production AI running >3 months.

### Version-Pin Everything (`version-pinning`)
**Intent:** Prevent silent vendor model updates from breaking production overnight.
**Consequences:** Prevents surprise regression. Pinned versions get deprecated; plan migrations.
**Use when:** Any production AI. Hygiene, not optional.

### Prompt Cache Warming (`prompt-cache-warming`)
**Intent:** Reduce LLM cost + latency on workflows with repeated long system prompts.
**Consequences:** 10-75% cost discount on cached portion. 30-60% latency reduction on hit.
**Use when:** RAG with repeated context, high-volume API workflows.

---

## Safety patterns

### Human-in-Loop Gates (`human-in-loop-gates`)
**Intent:** Insert human review at specific points when AI output drives consequential decisions.
**Consequences:** Prevents harm. Operational overhead. Reviewer drift over time.
**Use when:** Regulated domains, high-cost or irreversible decisions.

### Audit Log with Full Context (`audit-log-with-context`)
**Intent:** After-the-fact reproducibility — required for post-incident review + regulatory response.
**Consequences:** Enables root-cause analysis. Storage cost. PII implications if logs contain personal data.
**Use when:** EU AI Act high-risk, US sectoral regulation, any deployment where "how did this happen" has value.

### Model Sandboxing for Agents (`model-sandboxing`)
**Intent:** Constrain what an LLM agent can DO when given code-execution or browser-use tools.
**Consequences:** Bounds blast radius. +500ms-1s overhead per sandbox start.
**Use when:** Agents with consequential tools (code, browser, file-system).

---

End of patterns library.
