LLM Training GPU-Hours & Cost Calculator

Turn a training budget (params × tokens) into GPU-hours, wall-clock days and rental dollars at your MFU.

Model parameters (B)Training tokens (B)GPUNumber of GPUsModel FLOPs utilization (%)Rental price per GPU-hour ($)

—

Total GPU-hours

—

Wall-clock days

—

Rental cost ($)

Defaults: a 7B model on 2T tokens at 40% MFU on 64 H100s ≈ 24 days. MFU of 35–45% is realistic at scale; small clusters with slow interconnect often sit nearer 25%.

Formula

GPU-hours = 6·N·D ÷ (peak_FLOPS × MFU) ÷ 3600 · wall-clock = GPU-hours ÷ n_GPUs

References: Chowdhery et al. (2022), PaLM (introduced MFU metric); Hoffmann et al. (2022), Chinchilla

Disclaimer: This tool is for general informational and estimation purposes only and is not professional financial, tax, accounting or legal advice. All figures are estimates — verify with a qualified professional before making decisions. Read the full disclaimer.

About LLM Training GPU-Hours & Cost Calculator

Before any training run, three numbers decide feasibility: GPU-hours, calendar time and dollars. This calculator chains the standard 6ND compute rule with your hardware's peak throughput and a realistic Model-FLOPs-Utilization to produce all three. MFU — the fraction of theoretical FLOPS your training loop actually sustains — is the honest efficiency metric introduced by the PaLM paper; 40% is good at scale, 50%+ is exceptional. Compare GPUs, cluster sizes and token budgets to find where your budget actually breaks.

How to use LLM Training GPU-Hours & Cost Calculator

1Enter your values into LLM Training GPU-Hours & Cost Calculator — sensible, domain-typical defaults are pre-filled so you see a real result immediately.
2The result recomputes live using the formula shown on the page; there is no button to press.
3Adjust any input to compare scenarios, then read the worked example to see the substituted numbers.

Why use LLM Training GPU-Hours & Cost Calculator?

✓Computes LLM Training GPU-Hours & Cost instantly in your browser — no sign-up, no upload, no server round-trip.
✓100% free and unlimited, with the exact formula shown: GPU-hours = 6.
✓Runs entirely client-side, so every value you enter stays private on your device.
✓Live recompute as you type, with a worked example and authoritative references for trust.

Frequently asked questions

What MFU should I assume?+

Well-tuned large-scale runs on H100s with FlashAttention and fused kernels report 38–45%. Small clusters bottlenecked on PCIe or Ethernet, or runs with heavy activation checkpointing, can fall to 20–30%. Measure a short run before extrapolating a long one.

Why do real runs cost more than this estimate?+

Restarts and node failures, evaluation passes, ablations, data pipeline stalls and the runs you throw away. Teams commonly budget 1.3–2× the napkin number. The calculator gives the irreducible floor — the 6ND physics — not the project cost.

Does this apply to fine-tuning too?+

Yes — fine-tuning is the same 6ND arithmetic with a much smaller D. A 7B full fine-tune on 1B tokens is ~42e18 FLOPs ≈ 30 H100-hours at 40% MFU. LoRA reduces optimizer memory, not these forward/backward FLOPs through the base model.

FP8 training — does it halve the time?+

On H100-class hardware FP8 roughly doubles peak matmul FLOPS, but end-to-end speedups are typically 1.2–1.5× because non-matmul work and communication do not accelerate. Model it here by raising effective TFLOPS or MFU accordingly.

Related tools

Related ML & AI tools

🧠

ROC-AUC Calculator (from TPR/FPR points)

Trapezoidal area under the ROC curve from your (FPR, TPR) operating points — the threshold-independent ranking score.

● Live

🧠

Classification Threshold Cost Calculator

Find the probability cutoff that minimizes expected cost given your false-positive and false-negative penalties.

● Live

🧠

Silhouette Score Calculator

Cluster cohesion vs separation for one point — the building block of the silhouette metric for choosing K.

● Live