⚡ ToolJolt · Free Web Story

How Much VRAM Do You Need to Run an LLM?

Weights, KV cache and the GPU that actually fits — free calculators for Llama, Mistral, Qwen and more.

Swipe to explore →
1 / 6

Llama 3 8B VRAM Calculator

Estimate GPU memory to run Llama 3 8B — weights, KV cache and overhead at FP16/INT8/INT4, with a fits-on-which-GPU verdict.

Open this free tool →
2 / 6

Llama 3 70B VRAM Calculator

Estimate GPU memory to run Llama 3 70B — weights, KV cache and overhead at FP16/INT8/INT4, with a fits-on-which-GPU verdict.

Open this free tool →
3 / 6

Mistral 7B VRAM Calculator

Estimate GPU memory to run Mistral 7B — weights, KV cache and overhead at FP16/INT8/INT4, with a fits-on-which-GPU verdict.

Open this free tool →
4 / 6

Qwen2.5 7B VRAM Calculator

Estimate GPU memory to run Qwen2.5 7B — weights, KV cache and overhead at FP16/INT8/INT4, with a fits-on-which-GPU verdict.

Open this free tool →
5 / 6

Custom LLM VRAM Calculator (Any Architecture)

Full inference-memory budget for ANY transformer from raw config.json fields — weights, KV cache, overhead.

Open this free tool →
6 / 6

KV-Cache Size Calculator (Any Model)

Generic per-token and total KV-cache memory from architecture fields — the long-context budgeting workhorse.

Open this free tool →

1,000+ more free tools

Every tool on ToolJolt is free, runs in your browser and needs no sign-up.

Browse all free tools →