🧮 Free Tool · No email required

LLM Cost Calculator

Compare monthly $-cost across the LLM providers that matter for enterprise: OpenAI, Anthropic, Google, DeepSeek, Meta on-prem, GigaChat (Russia), YandexGPT (Russia). Based on public vendor pricing as of June 2026.

Your workload

Sum of prompts + retrieved context (RAG) per month.

Sum of model completions per month. Typically 10-25% of input volume.

Want a custom architecture review?

This calculator gives a first-order estimate. A real architecture review will look at chunking strategy, embedding model choice, retrieval quality, latency requirements, vendor lock-in, multi-model routing — and the trade-offs that move the cost by 5-10x.

📅 Book a 30-min architecture review

Pricing sources & assumptions

  • OpenAI: GPT-4o $2.50/$10.00 per 1M input/output tokens · GPT-4o-mini $0.15/$0.60 per 1M (public pricing, June 2026).
  • Anthropic: Claude 3.5 Sonnet $3.00/$15.00 · Claude 3.5 Haiku $0.80/$4.00 per 1M (public pricing).
  • Google: Gemini 2.0 Flash $0.10/$0.40 per 1M (public pricing).
  • DeepSeek: DeepSeek-V3 $0.27/$1.10 per 1M (public pricing).
  • Llama 3.1 70B on-prem: ~$4.25 per 1M tokens combined (assumes 1× A100 40GB at $1.50/hour cloud, 100 tok/sec throughput, ~259M tokens/month per GPU). Cap-ex on-prem is lower per token but higher upfront.
  • GigaChat (Sber): GigaChat Pro ≈ 2.0 rub per 1000 tokens (combined I/O). At ~85 rub/USD, that is ~$0.024 per 1M tokens for Russian-language workloads. RU 152-FZ compliant.
  • YandexGPT (Yandex): YandexGPT Pro ≈ 1.2 rub per 1000 tokens combined. ≈ $0.014 per 1M tokens at 85 rub/USD. RU 152-FZ compliant.
  • Estimates exclude fine-tuning, embedding cost, vector DB hosting, observability, eval framework. Real-world TCO is typically 2-3x the model-API cost alone.
  • This calculator is informational. Vendor pricing changes — confirm at the vendor page before commitment.