All examples for Effect Size Calculator & Interpretation Guide

A/B Test: Large Sample Statistical Trap

Demonstrates the classic 'large sample trap' where statistical significance (p=0.008) masks a very small effect size (Cohen's h=0.018). Provides business impact scenarios at different traffic levels and a decision framework for implementation.

Input

What you provide to the skill

A/B test: 25,000 users per variant. Control 8.2% conversion, treatment 8.7% conversion. p=0.008. Worth implementing?

Output

What the skill generates for you

======================================================================
A/B Test Effect Size Analysis

Cohen’s h: 0.0180

Absolute lift: 0.0050 (0.50 percentage points)
Relative lift: 0.0610 (6.1%)
95% CI for difference: [0.0001, 0.0099]
[0.01%, 0.99%]

Number Needed to Treat (NNT): 200
(Need 200 users in treatment to get 1 extra conversion)

z-statistic: 2.010
p-value: 0.008

======================================================================
INTERPRETATION

Statistical Significance: YES (p = 0.008)

  • With 50,000 total users, the test has high power to detect small effects
  • The result is statistically reliable

Effect Size: SMALL

  • Cohen’s h = 0.0180 (very small by conventional standards)
  • Absolute lift: 0.5 percentage points
  • Relative lift: 6.1%
  • This is a 6.1% improvement over baseline

Practical Significance: DEPENDS ON SCALE

======================================================================
BUSINESS IMPACT SCENARIOS

Monthly impact at different scales:
(assuming 0.5pp lift = 6.1% relative improvement)

Low traffic: 10,000 monthly visitors
→ 50 extra conversions/month
@ $25 AOV: $1,250/month ($15,000/year)
@ $50 AOV: $2,500/month ($30,000/year)
@ $100 AOV: $5,000/month ($60,000/year)

Medium traffic: 100,000 monthly visitors
→ 500 extra conversions/month
@ $25 AOV: $12,500/month ($150,000/year)
@ $50 AOV: $25,000/month ($300,000/year)
@ $100 AOV: $50,000/month ($600,000/year)

High traffic: 1,000,000 monthly visitors
→ 5000 extra conversions/month
@ $25 AOV: $125,000/month ($1,500,000/year)
@ $50 AOV: $250,000/month ($3,000,000/year)
@ $100 AOV: $500,000/month ($6,000,000/year)

======================================================================
DECISION FRAMEWORK

✓ IMPLEMENT IF:

  1. High traffic volume (>100K visitors/month)

    • At 1M visitors: 5000 extra conversions/month
    • At $50 AOV: $250,000/month revenue impact
  2. Low implementation cost

    • Simple CSS/copy changes
    • No major engineering effort
    • Low maintenance burden
  3. No negative impacts

    • No increase in returns/refunds
    • No degradation of user experience
    • No harm to brand perception

✗ RECONSIDER IF:

  1. Low traffic volume (<10K visitors/month)

    • 50 conversions/month may not justify effort
  2. High implementation cost

    • Requires major platform changes
    • Ongoing maintenance complexity
    • Technical debt concerns
  3. Opportunity cost is high

    • Dev time could be spent on bigger wins
    • Other tests showing larger effects
    • Strategic priorities elsewhere

? INVESTIGATE FURTHER:

  1. Segment analysis

    • Does effect vary by user segment?
    • Mobile vs. desktop differences?
    • New vs. returning customers?
  2. Long-term effects

    • Does lift sustain over time?
    • Any learning effects?
    • Impact on LTV, not just initial conversion?
  3. Qualitative data

    • What does user feedback say?
    • Any unexpected behaviors?
    • Does it align with brand strategy?

======================================================================
STATISTICAL VS. PRACTICAL SIGNIFICANCE

This is a textbook example of the “large sample trap”:

• Statistical significance: p = 0.008 (highly significant)
• Effect size: Cohen’s h = 0.0180 (very small)

With 25,000 users per group, you have ~80% power to detect effects as
small as d ≈ 0.03. This means you can reliably detect effects that are
too small to matter in practice.

The p-value tells you the effect is REAL (not due to chance).
The effect size tells you the effect is SMALL (0.5pp improvement).

Whether “small but real” is worth implementing depends entirely on:
• Your traffic volume
• Your implementation costs
• Your opportunity costs

======================================================================
APA-STYLE REPORTING

The treatment group showed a statistically significant but small increase
in conversion rate (8.7%) compared to control (8.2%), z = 2.01,
p = 0.008, h = 0.018, 95% CI [0.01%, 0.99%].
The absolute lift of 0.5 percentage points represents a
6.1% relative improvement over baseline. At scale, this
translates to approximately 500 additional conversions
per 100,000 visitors.

======================================================================
VERDICT

SHORT ANSWER: Probably YES, but it depends on your scale.

The effect is statistically reliable (p = 0.008) but small in magnitude
(6.1% relative lift). The decision to implement should be
based on:

  1. TRAFFIC: If you have >100K monthly visitors, the cumulative impact
    likely justifies implementation.

  2. COST: If implementation is low-effort (simple changes), the ROI is
    favorable even at modest traffic levels.

  3. PRIORITY: If you have other experiments showing larger effects or
    higher-priority initiatives, consider those first.

RECOMMENDATION:
• High traffic (>500K/month) + low cost → IMPLEMENT
• Medium traffic (100K-500K/month) + low cost → IMPLEMENT
• Low traffic (<100K/month) + low cost → MAYBE (run cost-benefit)
• Any traffic + high cost → INVESTIGATE FURTHER (segment analysis)

The key insight: This is not a transformative change, but a marginal
improvement. At sufficient scale, marginal improvements compound into
meaningful business impact.

======================================================================