Expected Calibration Error (ECE) Calculator

Weighted gap between predicted confidence and actual accuracy across bins — the standard model-calibration metric.

Mean confidence per binActual accuracy per binSamples per bin

—

Expected calibration error

—

Max calibration error

A perfectly calibrated model's 80%-confidence predictions are right 80% of the time. Modern deep networks are typically OVER-confident (confidence > accuracy), which temperature scaling fixes cheaply. ECE quantifies the average miscalibration; MCE the worst bin.

Formula

ECE = Σ (binₙ / N) · |confidenceₙ − accuracyₙ| · MCE = max over bins of that gap

References: Guo et al. (2017), On Calibration of Modern Neural Networks; Naeini et al. (2015), Obtaining Well Calibrated Probabilities Using Bayesian Binning

About Expected Calibration Error (ECE) Calculator

A model that says '90% confident' should be right 90% of the time — but modern deep networks are notoriously overconfident, claiming 99% certainty on predictions that are right far less often. Expected Calibration Error quantifies this: it bins predictions by confidence, measures the gap between each bin's average confidence and its actual accuracy, and weights those gaps by bin size. This calculator computes ECE and the worst-bin Maximum Calibration Error from your reliability-diagram bins — the metric you watch when probabilities drive downstream decisions, and the one temperature scaling exists to minimize.

How to use Expected Calibration Error (ECE) Calculator

1Enter your values into Expected Calibration Error (ECE) Calculator — sensible, domain-typical defaults are pre-filled so you see a real result immediately.
2The result recomputes live using the formula shown on the page; there is no button to press.
3Adjust any input to compare scenarios, then read the worked example to see the substituted numbers.

Why use Expected Calibration Error (ECE) Calculator?

✓Computes Expected Calibration Error (ECE) instantly in your browser — no sign-up, no upload, no server round-trip.
✓100% free and unlimited, with the exact formula shown: ECE = Σ (binₙ / N).
✓Runs entirely client-side, so every value you enter stays private on your device.
✓Live recompute as you type, with a worked example and authoritative references for trust.

Frequently asked questions

What is a good ECE value?+

Lower is better; below ~0.02 (2%) is well calibrated, above ~0.1 indicates serious miscalibration needing correction. But ECE is a summary — always inspect the reliability diagram too, since a low ECE can hide compensating over- and under-confidence in different bins that cancel out in the average.

Why are modern neural networks overconfident?+

Guo et al. traced it to capacity, batch norm and weight decay choices that push softmax outputs toward extremes during training on cross-entropy. The model gets accuracy right but its probabilities saturate near 0 and 1, so a 'confidence' of 0.99 no longer corresponds to 99% empirical accuracy.

How do I fix poor calibration?+

Temperature scaling is the cheapest and most effective: divide the logits by a single learned scalar T (tuned on a validation set) before softmax. It doesn't change which class wins (so accuracy is unchanged) but softens overconfident probabilities. Isotonic regression and Platt scaling are alternatives for harder cases.

Does ECE have weaknesses?+

Yes — it depends on the number and placement of bins (equal-width vs equal-mass), and it can mask offsetting errors. Adaptive-binning ECE and the related Brier score (a proper scoring rule) address some of this. Use ECE for monitoring alongside a reliability diagram, not as the only calibration signal.

Related tools

Related ML & AI tools

🧠

ROC-AUC Calculator (from TPR/FPR points)

Trapezoidal area under the ROC curve from your (FPR, TPR) operating points — the threshold-independent ranking score.

● Live

🧠

Classification Threshold Cost Calculator

Find the probability cutoff that minimizes expected cost given your false-positive and false-negative penalties.

● Live

🧠

Silhouette Score Calculator

Cluster cohesion vs separation for one point — the building block of the silhouette metric for choosing K.

● Live