Something went wrong!
Hang in there while we get back on track
Small Pilot Study Proportion Comparison
Demonstrates Fisher's exact test for comparing treatment effectiveness with small samples, including power analysis guidance and honest interpretation of limitations.
Input
What you provide to the skill
Small pilot: 8 people tried treatment A (5 improved), 7 tried treatment B (4 improved). Binary outcome (improved/not). Which is more effective?
Output
What the skill generates for you
Recommended Test: Fisher’s Exact Test
Why This Test?
Based on your description:
- Binary outcome (improved/not improved)
- Two independent groups (Treatment A vs. Treatment B)
- Small sample sizes (n=8 and n=7)
- Comparing proportions between groups
- Fisher’s exact test is preferred over chi-square for small samples
Your Data Summary
Treatment A: 5/8 improved (62.5%)
Treatment B: 4/7 improved (57.1%)
Absolute difference: 5.4 percentage points
Assumptions
- Independence: Each person’s outcome is independent of others ✓
- Binary outcome: Improved vs. not improved ✓
- Small sample: Fisher’s exact is designed for this scenario ✓
Python Code
from scipy.stats import fisher_exact
import numpy as np
# Data: [improved, not_improved]
treatment_a = [5, 3] # 5 improved, 3 did not
treatment_b = [4, 3] # 4 improved, 3 did not
# Create contingency table
table = np.array([treatment_a, treatment_b])
# Run Fisher's exact test (two-tailed)
odds_ratio, p_value = fisher_exact(table, alternative='two-sided')
print("=== RESULTS ===")
print(f"Treatment A: {treatment_a[0]}/{sum(treatment_a)} improved ({treatment_a[0]/sum(treatment_a):.1%})")
print(f"Treatment B: {treatment_b[0]}/{sum(treatment_b)} improved ({treatment_b[0]/sum(treatment_b):.1%})")
print(f"\nOdds Ratio: {odds_ratio:.3f}")
print(f"P-value (two-tailed): {p_value:.4f}")
if p_value < 0.05:
print("\n✓ Statistically significant difference (p < 0.05)")
else:
print(f"\n✗ No statistically significant difference (p = {p_value:.4f})")
print(" With such small samples, we cannot detect a reliable difference.")
# Effect size: Absolute Risk Reduction and Relative Risk
p_a = treatment_a[0] / sum(treatment_a)
p_b = treatment_b[0] / sum(treatment_b)
arr = p_a - p_b
relative_risk = p_a / p_b
print(f"\nAbsolute Risk Difference: {arr:.1%}")
print(f"Relative Risk: {relative_risk:.3f}")
Interpretation
Statistical Significance:
With a p-value of approximately 0.99 (you’ll get the exact value when you run the code), there is no statistically significant difference between the two treatments.
Why?
Your sample is extremely small (n=15 total). Fisher’s exact test correctly accounts for this:
- Treatment A: 62.5% improved
- Treatment B: 57.1% improved
- Only a 5.4 percentage point difference
With only 15 people, this tiny difference could easily occur by chance.
Effect Size
# Odds Ratio interpretation
# OR ≈ 1.25 (Treatment A has ~25% higher odds of improvement)
# But this is not statistically reliable given the small sample
# Confidence interval for proportion difference (optional)
from statsmodels.stats.proportion import confint_proportions_2indep
ci_low, ci_high = confint_proportions_2indep(
count1=5, nobs1=8,
count2=4, nobs2=7,
method='wald'
)
print(f"95% CI for difference: [{ci_low:.3f}, {ci_high:.3f}]")
Business/Clinical Interpretation
Conclusion: Based on this pilot data, you cannot conclude that either treatment is more effective.
Why this matters:
- The observed difference (62.5% vs 57.1%) is too small and your sample is too small to distinguish from random variation
- With only 15 people total, you’d need a much larger difference to detect it reliably
Recommendations:
- Increase sample size: You need at least 50-100 participants per group to detect small-to-moderate differences
- Pool data: If these treatments are similar, consider combining them versus a control
- Power analysis: Run a power calculation to determine the sample size needed to detect a meaningful difference (e.g., 20% improvement difference)
Power Analysis for Future Study
from statsmodels.stats.power import zt_ind_solve_power
# To detect a 20 percentage point difference (e.g., 50% vs 70%)
# with 80% power and alpha=0.05:
n_needed = zt_ind_solve_power(
effect_size=0.4, # Cohen's h for 50% vs 70%
alpha=0.05,
power=0.8,
alternative='two-sided'
)
print(f"Participants needed per group: {n_needed:.0f}")
# Expect: ~100 per group for moderate effects
Reporting Template
For Academic Use (APA format):
“Fisher’s exact test revealed no significant difference in improvement rates between Treatment A (62.5%, 5/8) and Treatment B (57.1%, 4/7), p = .99, OR = 1.25, 95% CI [X.XX, X.XX]. The small sample size (n=15) limited statistical power.”
For Business/Clinical Use:
“In this small pilot (n=15), Treatment A showed a slightly higher improvement rate (62.5%) compared to Treatment B (57.1%), but this 5-percentage-point difference was not statistically significant (p = .99). Conclusion: The data do not support choosing one treatment over the other. A larger study (≥100 participants per group) is needed to detect meaningful differences.”
Common Pitfalls to Avoid
- Don’t use chi-square with small samples – Fisher’s exact is the correct choice
- Don’t over-interpret small differences – 5% difference with n=15 proves nothing
- Don’t ignore power – Your study had low power to detect anything but huge effects
- Don’t assume “no significant difference” = “treatments are equal” – It means you don’t have enough data
Next Steps
- Run a power analysis to determine required sample size for a full study
- Consider what size of effect would be clinically meaningful (e.g., 15% improvement difference?)
- Design a properly powered RCT if resources allow
About This Skill
Interactive statistical advisor that helps choose the right statistical test for your research or business analysis, then provides assumption checks, Python code, effect sizes, and reporting templates.
View Skill DetailsMore Examples
Regional Customer Satisfaction Analysis
Shows one-way ANOVA for comparing satisfaction scores across 4 regions, including post-hoc tests, effect size, and both business and academic reporting formats.
Comparing Two Independent Groups
Demonstrates independent samples t-test selection for comparing continuous scores between two groups, with assumption checks, effect size calculation, and APA reporting template.