Macro / Micro / Weighted F1 Calculator
Aggregate per-class precision and recall into macro, micro and weighted F1 — and see why they disagree on imbalanced data.
When classes are imbalanced these diverge sharply: a model great on the big class and poor on rare ones gets high micro/weighted but low macro F1. Report macro when minority-class performance matters; micro when every sample counts equally.
Formula
About Macro / Micro / Weighted F1 Calculator
Multi-class F1 has three flavors and choosing wrong can hide a broken model. Macro-F1 averages each class's F1 equally — so a rare class you're failing drags the score down honestly. Micro-F1 pools all predictions before computing, so it tracks overall sample accuracy and lets big classes dominate. Weighted-F1 splits the difference, averaging per-class F1 by support. This calculator computes all three from your per-class confusion counts and makes their disagreement visible — which is exactly the diagnostic you want on imbalanced data.
How to use Macro / Micro / Weighted F1 Calculator
- 1Enter your values into Macro / Micro / Weighted F1 Calculator — sensible, domain-typical defaults are pre-filled so you see a real result immediately.
- 2The result recomputes live using the formula shown on the page; there is no button to press.
- 3Adjust any input to compare scenarios, then read the worked example to see the substituted numbers.
Why use Macro / Micro / Weighted F1 Calculator?
- ✓Computes Macro / Micro / Weighted F1 instantly in your browser — no sign-up, no upload, no server round-trip.
- ✓100% free and unlimited, with the exact formula shown: macro = mean(per-class F1).
- ✓Runs entirely client-side, so every value you enter stays private on your device.
- ✓Live recompute as you type, with a worked example and authoritative references for trust.
Frequently asked questions
Macro vs micro F1 — when do I use each?+
Use macro-F1 when every class matters equally regardless of frequency (e.g. you care about rare disease subtypes as much as common ones). Use micro-F1 when every sample matters equally and you're fine with frequent classes dominating. If they disagree a lot, your model is imbalanced across classes — investigate.
Why does micro-F1 equal accuracy sometimes?+
In single-label multi-class classification where each sample has exactly one prediction and one true class, micro-precision = micro-recall = micro-F1 = accuracy. They only diverge in multi-label settings or when some samples are unlabeled. So 'micro-F1' on a standard classifier is just accuracy by another name.
What is weighted-F1 best for?+
Reporting a single headline number that reflects real-world class frequencies without completely ignoring small classes (as micro does). It's scikit-learn's common default in classification_report. The risk: like accuracy, it can look good while a small but important class quietly fails — always glance at macro too.
My macro-F1 is much lower than accuracy — is the model bad?+
Not necessarily bad overall, but bad on the minority classes. Macro-F1 well below accuracy is the classic signature of a model that learned the easy majority and gave up on rare classes. Whether that's a problem depends entirely on whether those rare classes matter to you.
Related tools
- Balanced Accuracy & Youden's J Calculator
- Classification Metrics — Confusion Matrix & Metrics Calculator
- Spam Filter — Confusion Matrix & Metrics Calculator
- Medical Diagnostic Test — Confusion Matrix & Metrics Calculator
- Fraud Detection — Confusion Matrix & Metrics Calculator
- Customer Churn Prediction — Confusion Matrix & Metrics Calculator
- BLEU Score Calculator
- ROUGE Score Calculator
Related ML & AI tools
ROC-AUC Calculator (from TPR/FPR points)
Trapezoidal area under the ROC curve from your (FPR, TPR) operating points — the threshold-independent ranking score.
● LiveClassification Threshold Cost Calculator
Find the probability cutoff that minimizes expected cost given your false-positive and false-negative penalties.
● LiveSilhouette Score Calculator
Cluster cohesion vs separation for one point — the building block of the silhouette metric for choosing K.
● Live