ToolJoltTools

API vs Self-Hosting LLM Cost Calculator

Break-even between per-token API pricing and renting GPUs — utilization is the whole story; see yours.

API cost / month ($)
Self-host cost / month ($)
GPU utilization needed (%)
Break-even volume (M tok/mo)

The hidden variable is utilization: a rented H100 serving 2,500 tok/s could process 6.6B tokens/month, but real traffic is bursty — 10–30% utilization is common, tripling effective self-host cost. APIs amortize that across customers.

Formula

API = tokens × price · self = GPUs × $/hr × 730 h · break-even = monthly_GPU_cost ÷ API_price
References: vLLM throughput benchmarks; Public per-token pricing pages (2025–26)

Disclaimer: This tool is for general informational and estimation purposes only and is not professional financial, tax, accounting or legal advice. All figures are estimates — verify with a qualified professional before making decisions. Read the full disclaimer.

About API vs Self-Hosting LLM Cost Calculator

Every team running LLM features eventually asks whether the API bill should become a GPU bill. The arithmetic is simple — per-token price versus rental hours — but the verdict hinges on a variable people forget: utilization. A GPU busy 15% of the time costs 6.7× its benchmark price per token. This calculator runs both sides honestly: your monthly tokens against blended API pricing, versus GPUs × hours × rate, with the utilization your throughput implies and the break-even volume where the lines cross. Bring your real traffic numbers; the answer changes completely between 50M and 5B tokens a month.

How to use API vs Self-Hosting LLM Cost Calculator

  1. 1Enter your values into API vs Self-Hosting LLM Cost Calculator — sensible, domain-typical defaults are pre-filled so you see a real result immediately.
  2. 2The result recomputes live using the formula shown on the page; there is no button to press.
  3. 3Adjust any input to compare scenarios, then read the worked example to see the substituted numbers.

Why use API vs Self-Hosting LLM Cost Calculator?

  • Computes API vs Self-Hosting LLM Cost instantly in your browser — no sign-up, no upload, no server round-trip.
  • 100% free and unlimited, with the exact formula shown: API = tokens × price.
  • Runs entirely client-side, so every value you enter stays private on your device.
  • Live recompute as you type, with a worked example and authoritative references for trust.

Frequently asked questions

What does self-hosting cost beyond the GPU?+

Engineering time (the big one — serving stacks, monitoring, failover), egress, storage, and the latency/quality iterations APIs give free. A common rule: add 30–50% to raw GPU cost, more if it's your first deployment. The calculator's GPU rate can absorb this as a loaded cost.

Why are API prices sometimes BELOW raw GPU cost?+

Providers batch many customers onto each GPU at near-100% utilization, use custom kernels/quantization, and sometimes price flagship models as loss leaders. For bursty workloads under ~100M tokens/month, beating a competitive API on cost is genuinely hard.

When does self-hosting clearly win?+

Steady high volume (utilization >40%), strict data-residency requirements, fine-tuned models APIs won't host, latency floors needing colocation, or per-token output far above input (APIs price output 3–5× input). Several of these together make the case decisive.

How do I estimate my throughput input?+

Benchmark with your actual prompt/output length mix — vLLM on one H100 does ~2–5K tok/s for an 8B at mixed traffic, ~10× less for a 70B. Short-prompt chat differs 3× from long-context RAG. The utilization output then tells you GPUs needed at peak vs average.

Related tools

Related ML & AI tools

Sponsored