Weights, KV cache and the GPU that actually fits — free calculators for Llama, Mistral, Qwen and more.
Estimate GPU memory to run Llama 3 8B — weights, KV cache and overhead at FP16/INT8/INT4, with a fits-on-which-GPU verdict.
Estimate GPU memory to run Llama 3 70B — weights, KV cache and overhead at FP16/INT8/INT4, with a fits-on-which-GPU verdict.
Estimate GPU memory to run Mistral 7B — weights, KV cache and overhead at FP16/INT8/INT4, with a fits-on-which-GPU verdict.
Estimate GPU memory to run Qwen2.5 7B — weights, KV cache and overhead at FP16/INT8/INT4, with a fits-on-which-GPU verdict.
Full inference-memory budget for ANY transformer from raw config.json fields — weights, KV cache, overhead.
Generic per-token and total KV-cache memory from architecture fields — the long-context budgeting workhorse.
Every tool on ToolJolt is free, runs in your browser and needs no sign-up.