🧠
1. Capability Ceiling
OpenAI frontier (GPT-4 class): top-tier across the board, leading on agent/tool use and structured output.
Anthropic frontier (Claude 4 class): top-tier with edges in long context, code generation, and refusal calibration.
Open-source (Llama 4, Mistral, Qwen): 6–12 months behind on broad reasoning; competitive on focused fine-tuned tasks.
💰
2. Cost Structure
OpenAI / Anthropic: per-token API pricing, predictable, no infra. Crosses over expensive at very high volume.
Open-source self-hosted: GPU CapEx (or rental) + ops + engineering. Cheaper inference at steady high volume (~10M+ queries/month).
⚡
3. Latency
API vendors: p50 ~500ms, p95 ~1.5s, occasional outages and latency spikes outside your control.
Self-hosted: tunable; you control latency budget, but you also own the outages. Co-located GPU = sub-200ms p50 feasible.
⚖️
4. Compliance
OpenAI / Anthropic: SOC 2, GDPR DPA, HIPAA via Azure/AWS partners. Data residency varies by region.
Self-hosted open-source: required for strict data residency (152-FZ, some banking, some EU public sector). You own the audit trail.
🔓
5. Lock-in Risk
Vendor APIs: medium — abstract behind a gateway and you can swap. Real risk is in vendor-specific features (assistants, agents, tools).
Open-source: low on the model itself; high on your operational stack (vLLM, serving infra, eval pipelines).
📊
Common 2026 Pattern
Primary: Anthropic Claude or OpenAI GPT for complex reasoning.
Secondary: the other frontier vendor as A/B alternative.
Fallback: self-hosted open-source for outage resilience and compliance edge cases.
Specialized: fine-tuned small open-source for high-volume narrow tasks.