All examples for Metrics & A/B Test Design Trainer

E-commerce Checkout Optimization

Practice designing an A/B test for reducing checkout steps in an e-commerce marketplace. Includes sample size calculations, metric selection feedback, and improved test design.

Input

What you provide to the skill

Give me an A/B testing scenario for e-commerce

Output

What the skill generates for you

A/B TEST DESIGN SCENARIO

Product Context: You’re a PM at an e-commerce marketplace. Your checkout flow currently requires 4 steps: Cart -> Shipping -> Payment -> Review. Your team proposes reducing this to 2 steps.

Current Metrics:

  • Daily checkout initiations: 50,000
  • Checkout completion rate: 45%
  • Average order value: $67
  • Time to complete checkout: 3.2 minutes (median)

Business Goal: Increase completed purchases without hurting order value.

Your Task: Design the complete A/B test including hypothesis, metrics, sample size, segments, and risk mitigation.


FEEDBACK ON YOUR A/B TEST DESIGN

Statistical Rigor Analysis

Python Calculation:

from scipy.stats import norm
import math
baseline = 0.45
target = 0.50
relative_lift = (target - baseline) / baseline # 11.1%
z_alpha = norm.ppf(1 - 0.05/2) # 1.96
z_beta = norm.ppf(1 - 0.20) # 0.84
pooled_p = (baseline + target) / 2
pooled_var = pooled_p * (1 - pooled_p)
n = ((z_alpha + z_beta)**2 * 2 * pooled_var) / (target - baseline)**2
print(f"Required per variant: {math.ceil(n):,}") # 3,141

Results:

  • Required: ~3,141 per variant
  • With 50k daily users at 50/50 split: <1 day for statistical power
  • Recommendation: Run for 1 week to capture day-of-week effects

What to Say in Interview:
“Given 45% baseline and targeting 5pp increase, I need ~3,100 users per variant at 95% confidence/80% power. With 50k daily users at 50/50 split, that’s <1 day, but I’d run 1 full week for day-of-week effects.”

OVERALL SCORE: 8/10

Strengths: Clear hypothesis, appropriate metric selection, practical risk mitigation
Practice More: Calculating sample size explicitly, distinguishing active test vs measurement windows