Zero Signup ToolsFree browser tools

Calculator Tools

A/B Test Significance Calculator

Calculate statistical significance, p-value, and confidence interval for A/B test conversion rates. Two-proportion z-test, runs in your browser.

A/B test significance calculator

Control group (A)

Enter the visitor count and conversion count for your baseline. Conversions can be any binary event: signups, purchases, clicks, or anything counted per visitor.

Control conversion rate: 5.00%

Variants

Add one or more variants to compare against the control. Each variant is compared independently to the control using a two-proportion z-test.

  • Variant B

    Conversion rate: 5.60%

Test settings

Pick a confidence level and a test type. The default 95% two-tailed test is the standard for product and marketing experiments.

Confidence level

Test type

Use two-tailed when you want to detect any change in either direction. Use one-tailed only when you have decided in advance that you only care about variants outperforming control. Critical z at this setting: 1.96.

Results

Each variant is compared to the control with a two-proportion z-test (pooled standard error). Confidence intervals use Wilson for rates and the unpooled Wald formula for the lift.

  • Variant BNot significant at 95% confidence

    Control rate

    5.00%

    4.59% to 5.44%

    Variant rate

    5.60%

    5.17% to 6.07%

    Absolute lift

    +0.60%

    95% CI: -0.02% to +1.22%

    Relative lift

    +12.00%

    vs. control rate

    z-score

    1.8938

    p-value (two-tailed)

    0.0583

    Observed power

    47.4%

    post-hoc

    Verdict

    Fail to reject H0

    alpha = 0.05

What the numbers mean

  • Conversion rate is conversions divided by visitors. The interval below it is a Wilson score interval at the chosen confidence level. Wilson stays well-calibrated near 0% and 100%, unlike the textbook normal approximation.
  • Absolute lift is the difference in rates in percentage points. A control of 5% and a variant of 6% is +1 point of absolute lift.
  • Relative lift is the same difference divided by the control rate, expressed as a percentage. A change from 5% to 6% is a +20% relative lift.
  • z-score measures how many pooled standard errors separate the two rates. The larger the absolute value of z, the smaller the p-value.
  • p-value is the probability of seeing a difference at least this extreme if the two rates were really identical. If p is below 1 minus your confidence level, the result is statistically significant.
  • Observed power is the chance the test would have detected the observed effect at your alpha. Low power on a non-significant test means the result is inconclusive (not a confirmation that the variants are equal).

Things this calculator assumes

  • Each visitor is counted once and either converts or does not. The z-test does not handle repeat visits, multi-touch events, or revenue-per-visitor by itself.
  • The sample sizes are large enough for the normal approximation. When either group has fewer than about 30 conversions, treat the p-value as approximate.
  • Stop the test based on a pre-declared sample size or a sequential testing scheme, not by checking the p-value every day. Peeking inflates false positives.
  • Each variant is compared independently to the control. With many variants, consider a Bonferroni or Holm correction: divide alpha by the number of comparisons before declaring a winner.
  • Everything runs locally in your browser. The visitor and conversion counts you enter are never uploaded.

How to use

  1. Enter the control group's visitor count and conversion count. A conversion is any binary event you are testing: signup, purchase, click, form submit, anything counted per visitor.
  2. Add one or more variants and enter the visitor and conversion counts for each. Use Add variant for multi-arm tests with up to eight variants.
  3. Pick a confidence level. 95 percent is the standard for product and marketing experiments. Use 99 percent for higher-stakes decisions or 90 percent for early-stage exploration.
  4. Choose the test type. Two-tailed is the safe default. Only use one-tailed if the experiment was designed in advance to only act on variants outperforming control.
  5. Read the results card for each variant. The colored badge tells you whether the variant is a significant winner, a significant loser, or not significant at your confidence level.
  6. Check the p-value and confidence interval. A p-value below 1 minus your confidence level means significant. The confidence interval shows the plausible range for the true lift.
  7. Glance at observed power. If a result is not significant and power is well under 80 percent, the test is inconclusive and may need more traffic, not a verdict of no effect.
  8. Click Copy summary to grab a clean text block of the verdict, lift, p-value, and z-score for every variant, ready to paste into a launch doc or a Slack message.

About this tool

A/B Test Significance Calculator turns the visitor and conversion counts from a running or finished split test into the standard statistical readout that product managers, growth marketers, and experimentation teams need before declaring a winner. Enter the control group, add one or more variants, and the tool computes each variant's conversion rate with a Wilson 95% confidence interval, the absolute lift in percentage points, the relative lift in percent versus control, the z-score from a two-proportion z-test using a pooled standard error, the one-tailed and two-tailed p-values from the standard normal CDF, the confidence interval for the lift built from the unpooled Wald standard error, and the observed post-hoc power of the test at your chosen alpha. Pick the standard 90, 95, or 99 percent confidence level or type a custom value, and switch between a two-tailed test (the default and the right choice when you care about any direction of change) and a one-tailed test (only when the experiment was set up in advance to detect variants outperforming control). Every result is tagged with a clear verdict: a green significant winner, a red significantly worse than control, or a neutral not significant at the chosen confidence level, so non-statisticians on the team can read the table without having to interpret a raw p-value. Up to eight variants can be compared against a single control, which is the typical setup for a multi-arm split test or a sequential A/B/n redesign. The Wilson interval on each rate keeps the confidence bound calibrated even when conversion rates are very low or very high, where the textbook normal approximation breaks down. The lift interval uses the unpooled standard error because the pooled SE is only appropriate under the null hypothesis (which is what the z-test itself is checking) and underestimates uncertainty in the actual difference once you reject the null. Observed power gives a quick check on whether a non-significant result reflects no effect or simply too small a sample: if power is well below 80 percent at the observed effect, the test is inconclusive rather than negative. The Copy summary button produces a multi-line text block ready to drop into a Slack message, a launch doc, a pull request description, or a quarterly experimentation review. Useful for marketing landing page tests, email subject line tests, signup funnel optimization, checkout flow experiments, pricing page tests, push notification copy tests, mobile app onboarding splits, ad creative tests, and any binary-outcome experiment where one group sees a control experience and other groups see one or more variants. Everything runs locally in your browser; the visitor and conversion counts you enter are never uploaded.

Free to use. Works in your browser. No signup, no login.

Related tools

You may also like

All tools
All toolsCalculator Tools