API vs Self-Hosting LLM Cost Calculator
Break-even between per-token API pricing and renting GPUs — utilization is the whole story; see yours.
The hidden variable is utilization: a rented H100 serving 2,500 tok/s could process 6.6B tokens/month, but real traffic is bursty — 10–30% utilization is common, tripling effective self-host cost. APIs amortize that across customers.
Formula
Disclaimer: This tool is for general informational and estimation purposes only and is not professional financial, tax, accounting or legal advice. All figures are estimates — verify with a qualified professional before making decisions. Read the full disclaimer.
About API vs Self-Hosting LLM Cost Calculator
Every team running LLM features eventually asks whether the API bill should become a GPU bill. The arithmetic is simple — per-token price versus rental hours — but the verdict hinges on a variable people forget: utilization. A GPU busy 15% of the time costs 6.7× its benchmark price per token. This calculator runs both sides honestly: your monthly tokens against blended API pricing, versus GPUs × hours × rate, with the utilization your throughput implies and the break-even volume where the lines cross. Bring your real traffic numbers; the answer changes completely between 50M and 5B tokens a month.
How to use API vs Self-Hosting LLM Cost Calculator
- 1Enter your values into API vs Self-Hosting LLM Cost Calculator — sensible, domain-typical defaults are pre-filled so you see a real result immediately.
- 2The result recomputes live using the formula shown on the page; there is no button to press.
- 3Adjust any input to compare scenarios, then read the worked example to see the substituted numbers.
Why use API vs Self-Hosting LLM Cost Calculator?
- ✓Computes API vs Self-Hosting LLM Cost instantly in your browser — no sign-up, no upload, no server round-trip.
- ✓100% free and unlimited, with the exact formula shown: API = tokens × price.
- ✓Runs entirely client-side, so every value you enter stays private on your device.
- ✓Live recompute as you type, with a worked example and authoritative references for trust.
Frequently asked questions
What does self-hosting cost beyond the GPU?+
Engineering time (the big one — serving stacks, monitoring, failover), egress, storage, and the latency/quality iterations APIs give free. A common rule: add 30–50% to raw GPU cost, more if it's your first deployment. The calculator's GPU rate can absorb this as a loaded cost.
Why are API prices sometimes BELOW raw GPU cost?+
Providers batch many customers onto each GPU at near-100% utilization, use custom kernels/quantization, and sometimes price flagship models as loss leaders. For bursty workloads under ~100M tokens/month, beating a competitive API on cost is genuinely hard.
When does self-hosting clearly win?+
Steady high volume (utilization >40%), strict data-residency requirements, fine-tuned models APIs won't host, latency floors needing colocation, or per-token output far above input (APIs price output 3–5× input). Several of these together make the case decisive.
How do I estimate my throughput input?+
Benchmark with your actual prompt/output length mix — vLLM on one H100 does ~2–5K tok/s for an 8B at mixed traffic, ~10× less for a 70B. Short-prompt chat differs 3× from long-context RAG. The utilization output then tells you GPUs needed at peak vs average.
Related tools
- Knowledge Distillation Compression Calculator
- Pruning & Sparsity Savings Calculator
- GPU Electricity Cost Calculator
- LLM Batching Throughput & Latency Calculator
- Custom LLM VRAM Calculator (Any Architecture)
- Context Window Token Budget Calculator
- Manufacturing Defect Detection — Confusion Matrix & Metrics Calculator
- F-beta Score Calculator
Related ML & AI tools
ROC-AUC Calculator (from TPR/FPR points)
Trapezoidal area under the ROC curve from your (FPR, TPR) operating points — the threshold-independent ranking score.
● LiveClassification Threshold Cost Calculator
Find the probability cutoff that minimizes expected cost given your false-positive and false-negative penalties.
● LiveSilhouette Score Calculator
Cluster cohesion vs separation for one point — the building block of the silhouette metric for choosing K.
● Live