Something went wrong!
Hang in there while we get back on track
Small Dataset Feasibility Analysis
Guidance on building a churn model with only 28 users (below the 30-user minimum), including statistical limitations, confidence thresholds, and alternative approaches.
Input
What you provide to the skill
I have 28 users with login_count, feature_adoption, and churned column. Can I build a model with this smaller dataset?
Output
What the skill generates for you
Small Dataset Feasibility Analysis
Your Situation
- Sample size: 28 users
- Features: login_count, feature_adoption, churned
- Below minimum: The skill recommends minimum 30 users
Can You Build a Model? YES, with caveats
Option 1: Override Minimum Sample Size (Recommended)
You can bypass the 30-user validation by using the --min-sample flag:
cd /workspace/webserver/lib/smartvox/skill_hub/skills/churn-risk-scoring-model-builder
uv run scripts/calculate_churn_risk.py your_data.csv --format markdown --min-sample 20 > analysis.md
This will allow the script to process your 28-user dataset.
Critical Requirements You MUST Meet
Even with 28 users, you need:
-
Minimum churn representation: At least 5% churned users
- With 28 users, you need at least 2 churned users (7%)
- Ideally 3-4 churned users (10-14%) for meaningful patterns
-
Binary churned column: Must contain only 0 (retained) or 1 (churned)
-
Numeric behavioral columns: login_count and feature_adoption must be numeric values
Statistical Limitations with 28 Users
Reduced confidence:
- Correlation coefficients will have wider confidence intervals
- P-values may not reach statistical significance (p < 0.05)
- Risk of overfitting to noise rather than true patterns
Minimum threshold per correlation:
- With 28 users, you need |r| > 0.37 for statistical significance (p < 0.05)
- Weaker correlations (|r| < 0.37) should be treated as exploratory
What this means:
- If the model shows login_count has r = -0.65, that’s likely meaningful
- If it shows feature_adoption has r = -0.25, that’s too weak to trust with 28 users
Expected Model Quality
Best case scenario (strong signals):
- 2 behavioral features with clear patterns
- Correlations > 0.6 in absolute value
- Model can identify highest-risk users
- Use for: Prioritizing who to reach out to first
Realistic scenario (moderate signals):
- Correlations in 0.4-0.6 range
- Model provides rough risk ordering
- Use for: Directional insights, not precise predictions
Poor scenario (weak signals):
- Correlations < 0.4
- Model has limited predictive power
- Warning issued: “Weak correlations detected. Model may have limited predictive power.”
Recommended Approach with 28 Users
1. Run the analysis:
uv run scripts/calculate_churn_risk.py your_data.csv \
--format markdown \
--min-sample 20 \
--verbose > small_dataset_analysis.md
2. Check correlation strength:
- Look at the “Risk Signal Analysis” table
- Focus on signals with absolute correlation > 0.5
- Ignore signals with p-value > 0.05
3. Use conservatively:
- Treat as directional insights not precise predictions
- Focus on users scoring 80+ (clear red flags)
- Don’t over-interpret middle scores (40-60 range)
4. Validate manually:
- Review top 5 highest-risk users
- Do their behavioral patterns make intuitive sense?
- Are churned users actually scoring higher than retained?
5. Plan to iterate:
- As you gather more users (30, 50, 100+), rerun the model
- Compare how risk scores change with more data
- Update intervention thresholds based on real outcomes
Alternative: Descriptive Analysis
If you find weak correlations, don’t force a predictive model. Instead:
Manual comparison:
# What you can do with pandas directly
churned_users = df[df['churned'] == 1]
retained_users = df[df['churned'] == 0]
print("Churned users average login_count:", churned_users['login_count'].mean())
print("Retained users average login_count:", retained_users['login_count'].mean())
Look for patterns:
- Do churned users have consistently lower login_count?
- Is there a threshold (e.g., <5 logins) where most churned users fall?
- What feature_adoption level distinguishes the groups?
When to Wait for More Data
DON’T run the model yet if:
- Fewer than 2 churned users (can’t identify patterns)
- All users have very similar behavior (no variance)
- You have clear data quality issues (missing values, inconsistent definitions)
Wait until:
- You have 50+ users for reliable correlations
- At least 5 churned users for stable patterns
- Consistent behavioral tracking (same definitions across all users)
Bottom Line
Yes, you can build a model with 28 users, but:
- Use
--min-sample 20to override validation - Interpret results conservatively (directional, not definitive)
- Focus on strong signals only (|r| > 0.5)
- Validate insights manually before acting
- Plan to rerun with more data soon
Expected outcome:
You’ll get risk scores for all 28 users, but treat this as version 0.1 of your model - useful for initial prioritization, not production-grade predictions.
About This Skill
Build predictive churn risk models from behavioral data, identifying early warning signals and scoring users for proactive intervention.
View Skill DetailsMore Examples
Trial User Churn Risk Analysis
Analyzes 35 trial users with 3 behavioral metrics (days_active, features_used, support_tickets) to identify churn risk signals and score all users by risk level.
200-User SaaS Churn Model for CSM Outreach
Comprehensive churn risk scoring model for 200 SaaS users with 5 behavioral metrics, including activation thresholds, tier-specific intervention playbooks, and implementation roadmap.