ToolJoltTools

A/B Test Sample Size Calculator

Required visitors per variant to detect a target lift at your power and significance — before you launch.

Sample per variant
Total sample (2 variants)

Always size the experiment BEFORE launching, then run to that number and check once. Halving the detectable effect roughly quadruples the required sample. Low baseline rates and small target lifts demand huge samples — which is why tiny sites can't reliably A/B test small changes.

Formula

n per variant = (z_α/2·√(2p̄q̄) + z_β·√(p₁q₁+p₂q₂))² / (p₂-p₁)² — smaller effects and higher power need quadratically more samples
References: Kohavi, Tang & Xu (2020), Trustworthy Online Controlled Experiments; Cohen (1988), Statistical Power Analysis

About A/B Test Sample Size Calculator

Running an A/B test without first computing the sample size is how experiments lie to you — underpowered tests miss real effects, and 'run until it looks significant' guarantees false positives. This calculator gives the required visitors per variant to detect a target lift at your chosen statistical power and significance level, before you launch. It encodes the harsh reality of experimentation: detecting smaller effects costs quadratically more traffic, and low baseline conversion rates inflate the requirement, which is precisely why low-traffic sites struggle to test small UI changes reliably.

How to use A/B Test Sample Size Calculator

  1. 1Enter your values into A/B Test Sample Size Calculator — sensible, domain-typical defaults are pre-filled so you see a real result immediately.
  2. 2The result recomputes live using the formula shown on the page; there is no button to press.
  3. 3Adjust any input to compare scenarios, then read the worked example to see the substituted numbers.

Why use A/B Test Sample Size Calculator?

  • Computes A/B Test Sample Size instantly in your browser — no sign-up, no upload, no server round-trip.
  • 100% free and unlimited, with the exact formula shown: n per variant = (z_α/2.
  • Runs entirely client-side, so every value you enter stays private on your device.
  • Live recompute as you type, with a worked example and authoritative references for trust.

Frequently asked questions

Why must I compute sample size before the test?+

To fix the stopping point in advance and avoid 'peeking' — repeatedly checking and stopping when significant inflates false positives badly. Pre-computing the sample also tells you whether the test is even feasible: if you need 200,000 visitors per variant and get 1,000/week, that's a four-year experiment, and you should test a bigger change instead.

What is statistical power and what value should I use?+

Power is the probability of detecting a real effect of the size you care about — 80% is the convention (a 20% chance of missing a true effect). Higher power (90%) needs more samples but reduces the chance of a false negative. Choose 80% unless the cost of missing a real winner is high, then go to 90%.

Why do smaller effects need so many more samples?+

Because required sample scales with 1/(effect size)². Halving the lift you want to detect roughly quadruples the sample needed. Detecting a 10% relative lift might take thousands; detecting a 2% lift takes tens of thousands. This is the fundamental tax of precision and the reason tiny improvements are hard to validate.

What's the minimum detectable effect (MDE)?+

The smallest lift you want the test to reliably catch — set it to the smallest improvement that would actually change your decision to ship. Setting MDE too small makes the test impractically large; too large and you'll miss meaningful wins. It's the most important and most-neglected input, so choose it from business value, not optimism.

Related tools

Related Statistics tools

Sponsored