Matrix Multiplication FLOPs Calculator
Shape check + exact FLOPs/memory of (M×K)·(K×N) — and whether the matmul is compute- or bandwidth-bound on your GPU.
Set M=1 to see decode-time reality: a 1×4096 × 4096×4096 GEMV has intensity ~2 FLOP/byte — hopelessly bandwidth-bound, which is the entire reason batching exists.
Formula
Disclaimer: This tool is for general informational and estimation purposes only and is not professional financial, tax, accounting or legal advice. All figures are estimates — verify with a qualified professional before making decisions. Read the full disclaimer.
About Matrix Multiplication FLOPs Calculator
2MKN — three numbers and a doubling, yet this formula plus the roofline model explains most of modern AI systems engineering. This calculator gives the FLOPs of any matrix product, the minimum memory traffic, and their ratio: arithmetic intensity. Compare that intensity against your GPU's FLOPS-to-bandwidth ratio (H100: ~295 FLOP/byte at BF16) and you know immediately whether the operation can saturate the tensor cores or will idle waiting on HBM. The M=1 case is the punchline — single-token LLM decode is a GEMV with intensity ~2, which is why batching, speculative decoding and weight quantization dominate inference engineering.
How to use Matrix Multiplication FLOPs Calculator
- 1Enter your values into Matrix Multiplication FLOPs Calculator — sensible, domain-typical defaults are pre-filled so you see a real result immediately.
- 2The result recomputes live using the formula shown on the page; there is no button to press.
- 3Adjust any input to compare scenarios, then read the worked example to see the substituted numbers.
Why use Matrix Multiplication FLOPs Calculator?
- ✓Computes Matrix Multiplication FLOPs instantly in your browser — no sign-up, no upload, no server round-trip.
- ✓100% free and unlimited, with the exact formula shown: FLOPs = 2.
- ✓Runs entirely client-side, so every value you enter stays private on your device.
- ✓Live recompute as you type, with a worked example and authoritative references for trust.
Frequently asked questions
Why 2·M·K·N and not M·K·N?+
Each output element is a length-K dot product: K multiplies and K−1 adds ≈ 2K operations, times M·N outputs. Hardware specs and papers consistently use this 2× convention, while 'MACs' counts multiply-accumulates without it — know which one a number means.
What makes a matmul compute-bound?+
Arithmetic intensity above the hardware's FLOPS/bandwidth ratio. Square 4096³ matmuls at BF16 reach ~1365 FLOP/byte — comfortably compute-bound everywhere. Skinny matrices (small M or N) crater intensity, which is why kernel libraries fuse and batch them.
Why is LLM decoding bandwidth-bound?+
Generating one token multiplies a 1×H activation by every weight matrix: M=1 GEMVs with intensity ≈ 2 FLOP/byte at BF16. The GPU must stream all weights from HBM for 0.001% of its compute capability. Batching B requests raises M to B, recovering intensity linearly.
Does this cover attention and convolutions too?+
Yes — both lower to matmuls. Attention is two batched matmuls (see our attention FLOPs tool); a conv is an implicit GEMM of the im2col matrix. The same intensity analysis explains why depthwise convs and small attention heads underutilize GPUs.
Related ML & AI tools
ROC-AUC Calculator (from TPR/FPR points)
Trapezoidal area under the ROC curve from your (FPR, TPR) operating points — the threshold-independent ranking score.
● LiveClassification Threshold Cost Calculator
Find the probability cutoff that minimizes expected cost given your false-positive and false-negative penalties.
● LiveSilhouette Score Calculator
Cluster cohesion vs separation for one point — the building block of the silhouette metric for choosing K.
● Live