Interactive tools for power, inference, sequential design, traffic quality, and multiplicity—statistics computed in your browser, no libraries.
Estimate per-group sample size for a two-proportion z-test and see how power rises with n. MDE is the absolute lift in conversion rate (e.g. 0.02 = +2 percentage points).
Adjust inputs and click Calculate.
Curve: power vs. sample size per group at your baseline, MDE, and α (two-sided normal approximation).
Draw binomial outcomes for control and treatment, then run a pooled two-sided z-test. Under the null, p-values should be uniform; under a real effect, power shows up as more small p-values.
Z-test uses pooled SE under H₀: p_A = p_B. CI is Wald for the difference using unpooled SE.
Run a single simulation to see rates, z, p-value, and CI.
Cumulative z-statistics are compared to O'Brien–Fleming-style boundaries (5 looks) vs. a fixed horizon check only at the end. Re-using the naive α threshold at every peek inflates the false positive rate—that is what “peeking” illustrates.
Boundaries approximate standard two-sided OBF for K=5 interim analyses at α=0.05 (Jennison & Turnbull–style nominal levels). The path is one synthetic cumulative z built from partial sums of normal noise plus drift.
Compare observed assignment counts to the expected split using a chi-squared goodness-of-fit test. Large discrepancies suggest assignment bugs, targeting issues, or funnel drop-off that is not random.
Simulate 20 independent metrics, all truly null (noise only). At α=0.05 you still expect about one “significant” metric by chance. Bonferroni controls family-wise error; Benjamini–Hochberg controls false discovery rate (less conservative).
| Metric | p-value | Raw α=0.05 | Bonferroni | BH (FDR 5%) |
|---|
Each metric is a two-sided z-test from equal n=10,000 per arm with identical true rates (0.10)—only sampling noise differs.