AWQ 4-bit Model Size Calculator

Activation-aware 4-bit (AWQ) checkpoint sizing for vLLM/TensorRT-LLM deployments.

Model parameters (billions)e.g. 7.24 for Mistral 7B, 70.6 for Llama 3 70BRuntime + cache headroom (GB)KV cache + compute buffers; raise for long contexts

—

Weights / file size (GB)

—

Total memory to run (GB)

Formula

size(GB) = params × 4.15 bits ÷ 8 ÷ 10⁹ (4.15 = measured effective bits/weight for this format, incl. scales)

References: llama.cpp quantization documentation (k-quants); Frantar et al. (2022), GPTQ; Lin et al. (2023), AWQ; NVIDIA FP8 Transformer Engine docs

Disclaimer: This tool is for general informational and estimation purposes only and is not professional financial, tax, accounting or legal advice. All figures are estimates — verify with a qualified professional before making decisions. Read the full disclaimer.

About AWQ 4-bit Model Size Calculator

AWQ protects the ~1% of weight channels with the largest activation magnitudes by scaling them before 4-bit rounding — no backprop, no per-column solving. Same ~4.15 bpw as GPTQ-g128, often slightly better accuracy on instruction-tuned models. This calculator turns any parameter count into a concrete file size and a realistic total-memory figure, so you can check whether a given checkpoint fits your GPU VRAM or system RAM before downloading tens of gigabytes. It uses the measured effective bits-per-weight of the format — including block scales and mixed-precision tensor exceptions — rather than the marketing bit-width.

How to use AWQ 4-bit Model Size Calculator

1Enter your values into AWQ 4-bit Model Size Calculator — sensible, domain-typical defaults are pre-filled so you see a real result immediately.
2The result recomputes live using the formula shown on the page; there is no button to press.
3Adjust any input to compare scenarios, then read the worked example to see the substituted numbers.

Why use AWQ 4-bit Model Size Calculator?

✓Computes AWQ 4-bit Model Size instantly in your browser — no sign-up, no upload, no server round-trip.
✓100% free and unlimited, with the exact formula shown: size(GB) = params × 4.15 bits ÷ 8 ÷ 10⁹ (4.15 = measured effective bits/weight for this format, incl. scales).
✓Runs entirely client-side, so every value you enter stays private on your device.
✓Live recompute as you type, with a worked example and authoritative references for trust.

Frequently asked questions

AWQ vs GPTQ — practical difference?+

Both land at ~4.15 bpw. AWQ tends to preserve instruction-following slightly better (it never reconstructs against a dataset, reducing calibration overfit) and quantizes faster; GPTQ has broader kernel coverage on older stacks. On vLLM both are first-class.

Why 'activation-aware'?+

AWQ observes which input channels carry large activations at calibration time and rescales those channels so 4-bit rounding loses less of what the network actually uses. The insight: protecting 1% of salient channels recovers most of the quality gap.

Can I fine-tune an AWQ model?+

Not directly — AWQ checkpoints are inference artifacts. Fine-tune the FP16/BF16 base (or use QLoRA on NF4), then re-quantize with AWQ afterward. Re-run calibration on data resembling your fine-tune domain for best results.

What VRAM does a 70B AWQ need?+

Weights ≈ 70.6e9 × 4.15/8 ≈ 36.6 GB — runnable on a 48 GB A6000/L40S with healthy cache room, or on 2× 24 GB consumer cards with tensor parallelism. This calculator lets you check any parameter count instantly.

Related tools

Related ML & AI tools

🧠

ROC-AUC Calculator (from TPR/FPR points)

Trapezoidal area under the ROC curve from your (FPR, TPR) operating points — the threshold-independent ranking score.

● Live

🧠

Classification Threshold Cost Calculator

Find the probability cutoff that minimizes expected cost given your false-positive and false-negative penalties.

● Live

🧠

Silhouette Score Calculator

Cluster cohesion vs separation for one point — the building block of the silhouette metric for choosing K.

● Live