ToolJoltTools

LLM Training GPU-Hours & Cost Calculator

Turn a training budget (params × tokens) into GPU-hours, wall-clock days and rental dollars at your MFU.

Total GPU-hours
Wall-clock days
Rental cost ($)

Defaults: a 7B model on 2T tokens at 40% MFU on 64 H100s ≈ 24 days. MFU of 35–45% is realistic at scale; small clusters with slow interconnect often sit nearer 25%.

Formula

GPU-hours = 6·N·D ÷ (peak_FLOPS × MFU) ÷ 3600 · wall-clock = GPU-hours ÷ n_GPUs
References: Chowdhery et al. (2022), PaLM (introduced MFU metric); Hoffmann et al. (2022), Chinchilla

Disclaimer: This tool is for general informational and estimation purposes only and is not professional financial, tax, accounting or legal advice. All figures are estimates — verify with a qualified professional before making decisions. Read the full disclaimer.

About LLM Training GPU-Hours & Cost Calculator

Before any training run, three numbers decide feasibility: GPU-hours, calendar time and dollars. This calculator chains the standard 6ND compute rule with your hardware's peak throughput and a realistic Model-FLOPs-Utilization to produce all three. MFU — the fraction of theoretical FLOPS your training loop actually sustains — is the honest efficiency metric introduced by the PaLM paper; 40% is good at scale, 50%+ is exceptional. Compare GPUs, cluster sizes and token budgets to find where your budget actually breaks.

How to use LLM Training GPU-Hours & Cost Calculator

  1. 1Enter your values into LLM Training GPU-Hours & Cost Calculator — sensible, domain-typical defaults are pre-filled so you see a real result immediately.
  2. 2The result recomputes live using the formula shown on the page; there is no button to press.
  3. 3Adjust any input to compare scenarios, then read the worked example to see the substituted numbers.

Why use LLM Training GPU-Hours & Cost Calculator?

  • Computes LLM Training GPU-Hours & Cost instantly in your browser — no sign-up, no upload, no server round-trip.
  • 100% free and unlimited, with the exact formula shown: GPU-hours = 6.
  • Runs entirely client-side, so every value you enter stays private on your device.
  • Live recompute as you type, with a worked example and authoritative references for trust.

Frequently asked questions

What MFU should I assume?+

Well-tuned large-scale runs on H100s with FlashAttention and fused kernels report 38–45%. Small clusters bottlenecked on PCIe or Ethernet, or runs with heavy activation checkpointing, can fall to 20–30%. Measure a short run before extrapolating a long one.

Why do real runs cost more than this estimate?+

Restarts and node failures, evaluation passes, ablations, data pipeline stalls and the runs you throw away. Teams commonly budget 1.3–2× the napkin number. The calculator gives the irreducible floor — the 6ND physics — not the project cost.

Does this apply to fine-tuning too?+

Yes — fine-tuning is the same 6ND arithmetic with a much smaller D. A 7B full fine-tune on 1B tokens is ~42e18 FLOPs ≈ 30 H100-hours at 40% MFU. LoRA reduces optimizer memory, not these forward/backward FLOPs through the base model.

FP8 training — does it halve the time?+

On H100-class hardware FP8 roughly doubles peak matmul FLOPS, but end-to-end speedups are typically 1.2–1.5× because non-matmul work and communication do not accelerate. Model it here by raising effective TFLOPS or MFU accordingly.

Related tools

Related ML & AI tools

Sponsored