ToolJoltTools

Tensor Memory Size Calculator

Bytes of any tensor from its shape and dtype — parse '64, 3, 224, 224' and compare FP32/BF16/INT8/INT4.

Elements (M)
Memory (MB)
(GB)

The default — one training batch of ImageNet-sized images — is 38.5 MB at FP32. Multiply by the dozens of activation tensors a deep network keeps for backprop and GPU memory vanishes fast.

Formula

bytes = ∏ dims × sizeof(dtype) — accepts shapes separated by commas, spaces or ×
References: PyTorch tensor dtype documentation

About Tensor Memory Size Calculator

Every out-of-memory error is, at bottom, a failure of this multiplication: product of dimensions times bytes per element. This calculator parses any shape string — commas, spaces or × — and prices it across the dtype menu from FP64 down to INT4. It is deliberately simple because the skill it builds is the estimation habit: knowing instantly that a 64×3×224×224 batch is 38 MB at FP32, that BF16 halves it, and that an 8K-token attention matrix at FP32 would be 256 MB per head if FlashAttention didn't exist.

How to use Tensor Memory Size Calculator

  1. 1Enter your values into Tensor Memory Size Calculator — sensible, domain-typical defaults are pre-filled so you see a real result immediately.
  2. 2The result recomputes live using the formula shown on the page; there is no button to press.
  3. 3Adjust any input to compare scenarios, then read the worked example to see the substituted numbers.

Why use Tensor Memory Size Calculator?

  • Computes Tensor Memory Size instantly in your browser — no sign-up, no upload, no server round-trip.
  • 100% free and unlimited, with the exact formula shown: bytes = ∏ dims × sizeof(dtype) — accepts shapes separated by commas, spaces or ×.
  • Runs entirely client-side, so every value you enter stays private on your device.
  • Live recompute as you type, with a worked example and authoritative references for trust.

Frequently asked questions

Why does my GPU OOM when my tensors look small?+

Backprop keeps most intermediate activations alive until backward completes — dozens of copies of layer outputs, not one. Add gradients (same size as weights), optimizer states (2× weights for Adam) and allocator fragmentation, and 'small' tensors compound into gigabytes.

BF16 vs FP16 — same memory, what's the difference?+

Both are 2 bytes. BF16 keeps FP32's 8 exponent bits (same range, less precision) so it rarely overflows and usually trains without loss scaling; FP16 has more mantissa but a narrow range. Memory math is identical — choose by numerical behavior and hardware.

How is INT4 half a byte stored?+

Two 4-bit values pack into each byte, plus per-group scale factors stored separately (see our quantization calculators for the real effective rates ~4.1–4.8 bits). This tool's 0.5 B option gives the idealized packed size.

Do strides and views change memory use?+

Views, slices and transposes share the original storage — free. But .contiguous(), many reshape paths, and most elementwise ops materialize copies at full size. When tracing an OOM, count materialized storages, not tensor objects.

Related tools

Related ML & AI tools

Sponsored