Something went wrong!
Hang in there while we get back on track
Comparing Two Independent Groups
Demonstrates independent samples t-test selection for comparing continuous scores between two groups, with assumption checks, effect size calculation, and APA reporting template.
Input
What you provide to the skill
I have test scores from 20 students in a traditional classroom and 20 in an online classroom. Both groups have continuous scores 0-100. Are the groups significantly different?
Output
What the skill generates for you
Recommended Test: Independent Samples t-Test
Why This Test?
Based on your description:
- ✓ ONE continuous dependent variable (test scores 0-100)
- ✓ ONE categorical independent variable (classroom type) with TWO levels (traditional vs. online)
- ✓ Independent groups (different students in each classroom)
- ✓ Adequate sample size (n=20 per group, total N=40)
Assumptions to Check
-
Normality: Test scores approximately normal within each group
- Check: Shapiro-Wilk test, Q-Q plots
- Robustness: t-test is robust to moderate violations with equal group sizes
- If violated: Mann-Whitney U test
-
Homogeneity of Variance: Similar variance across groups
- Check: Levene’s test
- If violated: Welch’s t-test (unequal variances version)
-
Independence: Each score independent of others (satisfied by design)
Python Code
from scipy.stats import ttest_ind, levene, shapiro, mannwhitneyu
import numpy as np
# Your data
traditional = np.array(<your_data>) # 20 test scores from traditional classroom
online = np.array(<your_data>) # 20 test scores from online classroom
# Descriptive statistics
print(f"Traditional: M = {np.mean(traditional):.2f}, SD = {np.std(traditional, ddof=1):.2f}")
print(f"Online: M = {np.mean(online):.2f}, SD = {np.std(online, ddof=1):.2f}")
print(f"Difference: {np.mean(traditional) - np.mean(online):.2f} points")
# Check normality for each group
stat_trad, p_trad = shapiro(traditional)
stat_online, p_online = shapiro(online)
print(f"\nNormality checks:")
print(f"Traditional: p = {p_trad:.4f}")
print(f"Online: p = {p_online:.4f}")
# Check equal variance
stat_levene, p_levene = levene(traditional, online)
print(f"\nLevene's test for equal variances: p = {p_levene:.4f}")
# Decision logic for which test to use
if p_trad > 0.05 and p_online > 0.05:
print("\n✓ Normality assumption satisfied")
if p_levene > 0.05:
print("✓ Equal variances assumption satisfied")
print("\nUsing: Standard independent samples t-test")
t_stat, p_value = ttest_ind(traditional, online)
else:
print("✗ Equal variances violated")
print("\nUsing: Welch's t-test (unequal variances)")
t_stat, p_value = ttest_ind(traditional, online, equal_var=False)
else:
print("\n✗ Normality assumption violated")
print("\nUsing: Mann-Whitney U test (non-parametric alternative)")
u_stat, p_value = mannwhitneyu(traditional, online, alternative='two-sided')
print(f"U-statistic: {u_stat:.3f}")
print(f"P-value: {p_value:.4f}")
if p_value < 0.05:
print("\n✓ SIGNIFICANT DIFFERENCE between classroom types")
else:
print("\n✗ NO SIGNIFICANT DIFFERENCE between classroom types")
Effect Size: Cohen’s d
# Cohen's d for independent samples
pooled_std = np.sqrt(((len(traditional)-1)*np.var(traditional, ddof=1) +
(len(online)-1)*np.var(online, ddof=1)) /
(len(traditional) + len(online) - 2))
cohens_d = (np.mean(traditional) - np.mean(online)) / pooled_std
print(f"\nCohen's d = {cohens_d:.3f}")
# Interpretation
if abs(cohens_d) < 0.2:
print("Effect size: Negligible")
elif abs(cohens_d) < 0.5:
print("Effect size: Small")
elif abs(cohens_d) < 0.8:
print("Effect size: Medium")
else:
print("Effect size: Large")
Cohen’s d benchmarks:
- Small: 0.2
- Medium: 0.5
- Large: 0.8
APA Reporting Template
If significant:
An independent samples t-test revealed that traditional classroom students
(M = XX.X, SD = X.X) scored significantly [higher/lower] than online classroom
students (M = XX.X, SD = X.X), t(38) = X.XX, p = .XXX, d = X.XX.
If not significant:
An independent samples t-test found no significant difference in test scores
between traditional classroom students (M = XX.X, SD = X.X) and online classroom
students (M = XX.X, SD = X.X), t(38) = X.XX, p = .XXX, d = X.XX.
Common Pitfalls to Avoid
-
Don’t confuse statistical and practical significance: With n=20 per group, you have moderate power. A non-significant result doesn’t prove groups are identical—it may indicate insufficient power to detect a small effect.
-
Report effect size alongside p-value: The magnitude of difference (Cohen’s d) is as important as whether it’s significant.
-
Check assumptions: Don’t skip normality and equal variance checks. If violated, use the appropriate alternative.
-
Two-tailed vs. one-tailed: Use two-tailed unless you had a specific directional hypothesis BEFORE collecting data.
-
Interpretation context: A statistically significant 2-point difference may not be educationally meaningful, while a non-significant 8-point difference might warrant further investigation with larger samples.
Power Considerations
With n=20 per group, you have:
- 80% power to detect a large effect (d=0.8) at α=0.05
- 50% power to detect a medium effect (d=0.5)
- Low power (<30%) to detect a small effect (d=0.2)
If you find a non-significant result, consider whether your sample size was adequate for the expected effect size.
About This Skill
Interactive statistical advisor that helps choose the right statistical test for your research or business analysis, then provides assumption checks, Python code, effect sizes, and reporting templates.
View Skill DetailsMore Examples
Regional Customer Satisfaction Analysis
Shows one-way ANOVA for comparing satisfaction scores across 4 regions, including post-hoc tests, effect size, and both business and academic reporting formats.
Small Pilot Study Proportion Comparison
Demonstrates Fisher's exact test for comparing treatment effectiveness with small samples, including power analysis guidance and honest interpretation of limitations.