All examples for SaaS Retention Cohort Analyzer

Small Sample Daily Retention with Statistical Warnings

Daily cohort retention analysis for a small dataset (40 users) demonstrating proper handling of insufficient sample sizes with Wilson confidence intervals, power analysis, and clear limitations while still providing actionable insights.

Input

What you provide to the skill

Small sample dataset: 40 users signed up in January. Want daily cohort retention for purchase events plus 95% confidence intervals and comparison against B2C benchmarks.

Output

What the skill generates for you

Retention Cohort Analysis Report

Executive Summary

Analysis of 40 users who signed up in January 2024, tracking daily cohort retention for purchase events.

Key Finding: 77.5% Day 0 purchase rate, sharp drop to 32.5% by Day 1, stabilizing around 7.5% by Day 7+.

Statistical Warning: Sample size (n=40) is below recommended threshold (300+ per segment). Confidence intervals are wide. Results should be interpreted as directional trends only.


1. Data Quality Summary

Metric Value
Total Users 40
Date Range 2024-01-01 to 2024-01-20
Signup Cohorts 20 daily cohorts
Users Who Ever Purchased 31 (77.5%)

Data Validation:

  • No missing user_id values
  • All dates parseable
  • Sample size is small (40 users vs. recommended 300+)
  • Cohort sizes range from 1-3 users (insufficient for statistical significance)

3. Aggregated Retention Metrics with Confidence Intervals

Method: Wilson score confidence intervals (95%)

| Period | Active Users | Retention % | 95% CI Lower | 95% CI Upper |
|––––|–––––––|———––|–––––––|–––––––||
| Day 0 | 31 | 77.5% | 61.5% | 89.2% |
| Day 1 | 13 | 32.5% | 18.6% | 49.1% |
| Day 2 | 8 | 20.0% | 9.1% | 35.6% |
| Day 3 | 2 | 5.0% | 0.6% | 16.9% |
| Day 7 | 5 | 12.5% | 4.2% | 26.8% |

Critical Retention Cliff: Day 0 → Day 1 drop of 45 percentage points


4. Sample Size Assessment

Current Sample: 40 users total across 20 cohorts (avg 2 users/cohort)

Required for 80% Statistical Power:

  • Detect 20pp difference: ~200 users per cohort
  • Detect 10pp difference: ~800 users per cohort

Verdict: Current dataset is severely underpowered for statistical testing. All findings are directional only.


5. Benchmark Comparison: B2C Standards

Metric Your Data B2C Benchmark Assessment
Day 1 Retention 32.5% (CI: 18.6%-49.1%) 30-40% On par (but high uncertainty)
Day 7 Retention 12.5% (CI: 4.2%-26.8%) 15-25% Below average (wide range)

Important Context: B2C benchmarks typically measure “any activity” not purchase events. Purchase retention is typically 2-5x lower than activity retention, so your 12.5% may actually be above expectations for purchase-based retention.


7. Actionable Recommendations

Priority 1: Increase Sample Size (Critical)

Why: All current findings have confidence intervals too wide for decision-making.

Action Steps:

  1. Extend data collection to 300+ users (minimum)
  2. Track 2-3 full months of cohorts
  3. Ensure event tracking captures ALL purchase attempts

Priority 2: Investigate Day 0→1 Retention Cliff

Hypothesis: 45pp drop suggests users who don’t purchase immediately may not understand product value.

Diagnostic Questions:

  1. What % of Day 0 purchases are impulse vs. considered?
  2. Do Day 1 churned users even return to the product?
  3. Are Day 0 purchases driven by signup incentives?

9. Reproducible Analysis Code

import pandas as pd
import numpy as np
from scipy import stats
# Wilson confidence interval calculation
for day in range(8):
active = purchases[purchases['days_since_signup'] == day]['user_id'].nunique()
p = active / total_users
z = 1.96 # 95% CI
denominator = 1 + z**2 / total_users
center = (p + z**2 / (2 * total_users)) / denominator
margin = z * np.sqrt(p * (1-p) / total_users + z**2 / (4 * total_users**2)) / denominator

10. Limitations of This Analysis

What We Cannot Conclude:

  1. Causality: Analysis shows correlation, not causation
  2. Future Predictions: Retention curves describe past behavior only
  3. Segment Differences: Too small to compare user types
  4. Statistical Significance: No tests run due to insufficient sample size

Recommended Follow-Up: Collect 10x more data (400+ users) and re-run analysis with segmentation.