How Much VRAM Do You Need to Run an LLM?

⚡ ToolJolt · Free Web Story Weights, KV cache and the GPU that actually fits — free calculators for Llama, Mistral, Qwen and more. Swipe to explore →

Llama 3 8B VRAM Calculator

1 / 6 Estimate GPU memory to run Llama 3 8B — weights, KV cache and overhead at FP16/INT8/INT4, with a fits-on-which-GPU verdict. Open this free tool →

Llama 3 70B VRAM Calculator

2 / 6 Estimate GPU memory to run Llama 3 70B — weights, KV cache and overhead at FP16/INT8/INT4, with a fits-on-which-GPU verdict. Open this free tool →

Mistral 7B VRAM Calculator

3 / 6 Estimate GPU memory to run Mistral 7B — weights, KV cache and overhead at FP16/INT8/INT4, with a fits-on-which-GPU verdict. Open this free tool →

Qwen2.5 7B VRAM Calculator

4 / 6 Estimate GPU memory to run Qwen2.5 7B — weights, KV cache and overhead at FP16/INT8/INT4, with a fits-on-which-GPU verdict. Open this free tool →

Custom LLM VRAM Calculator (Any Architecture)

5 / 6 Full inference-memory budget for ANY transformer from raw config.json fields — weights, KV cache, overhead. Open this free tool →

KV-Cache Size Calculator (Any Model)

6 / 6 Generic per-token and total KV-cache memory from architecture fields — the long-context budgeting workhorse. Open this free tool →

1,000+ more free tools

Every tool on ToolJolt is free, runs in your browser and needs no sign-up. Browse all free tools →