Comparing Two Independent Groups

Demonstrates independent samples t-test selection for comparing continuous scores between two groups, with assumption checks, effect size calculation, and APA reporting template.

Input

What you provide to the skill

I have test scores from 20 students in a traditional classroom and 20 in an online classroom. Both groups have continuous scores 0-100. Are the groups significantly different?

Output

What the skill generates for you

Recommended Test: Independent Samples t-Test

Why This Test?

Based on your description:

✓ ONE continuous dependent variable (test scores 0-100)
✓ ONE categorical independent variable (classroom type) with TWO levels (traditional vs. online)
✓ Independent groups (different students in each classroom)
✓ Adequate sample size (n=20 per group, total N=40)

Assumptions to Check

Normality: Test scores approximately normal within each group
- Check: Shapiro-Wilk test, Q-Q plots
- Robustness: t-test is robust to moderate violations with equal group sizes
- If violated: Mann-Whitney U test
Homogeneity of Variance: Similar variance across groups
- Check: Levene’s test
- If violated: Welch’s t-test (unequal variances version)
Independence: Each score independent of others (satisfied by design)

Python Code

from scipy.stats import ttest_ind, levene, shapiro, mannwhitneyu
import numpy as np

# Your data
traditional = np.array(<your_data>)  # 20 test scores from traditional classroom
online = np.array(<your_data>)       # 20 test scores from online classroom

# Descriptive statistics
print(f"Traditional: M = {np.mean(traditional):.2f}, SD = {np.std(traditional, ddof=1):.2f}")
print(f"Online: M = {np.mean(online):.2f}, SD = {np.std(online, ddof=1):.2f}")
print(f"Difference: {np.mean(traditional) - np.mean(online):.2f} points")

# Check normality for each group
stat_trad, p_trad = shapiro(traditional)
stat_online, p_online = shapiro(online)
print(f"\nNormality checks:")
print(f"Traditional: p = {p_trad:.4f}")
print(f"Online: p = {p_online:.4f}")

# Check equal variance
stat_levene, p_levene = levene(traditional, online)
print(f"\nLevene's test for equal variances: p = {p_levene:.4f}")

# Decision logic for which test to use
if p_trad > 0.05 and p_online > 0.05:
    print("\n✓ Normality assumption satisfied")
    if p_levene > 0.05:
        print("✓ Equal variances assumption satisfied")
        print("\nUsing: Standard independent samples t-test")
        t_stat, p_value = ttest_ind(traditional, online)
    else:
        print("✗ Equal variances violated")
        print("\nUsing: Welch's t-test (unequal variances)")
        t_stat, p_value = ttest_ind(traditional, online, equal_var=False)
else:
    print("\n✗ Normality assumption violated")
    print("\nUsing: Mann-Whitney U test (non-parametric alternative)")
    u_stat, p_value = mannwhitneyu(traditional, online, alternative='two-sided')
    print(f"U-statistic: {u_stat:.3f}")

print(f"P-value: {p_value:.4f}")

if p_value < 0.05:
    print("\n✓ SIGNIFICANT DIFFERENCE between classroom types")
else:
    print("\n✗ NO SIGNIFICANT DIFFERENCE between classroom types")

Effect Size: Cohen’s d

# Cohen's d for independent samples
pooled_std = np.sqrt(((len(traditional)-1)*np.var(traditional, ddof=1) + 
                      (len(online)-1)*np.var(online, ddof=1)) / 
                     (len(traditional) + len(online) - 2))

cohens_d = (np.mean(traditional) - np.mean(online)) / pooled_std

print(f"\nCohen's d = {cohens_d:.3f}")

# Interpretation
if abs(cohens_d) < 0.2:
    print("Effect size: Negligible")
elif abs(cohens_d) < 0.5:
    print("Effect size: Small")
elif abs(cohens_d) < 0.8:
    print("Effect size: Medium")
else:
    print("Effect size: Large")

Cohen’s d benchmarks:

Small: 0.2
Medium: 0.5
Large: 0.8

APA Reporting Template

If significant:

An independent samples t-test revealed that traditional classroom students 
(M = XX.X, SD = X.X) scored significantly [higher/lower] than online classroom 
students (M = XX.X, SD = X.X), t(38) = X.XX, p = .XXX, d = X.XX.

If not significant:

An independent samples t-test found no significant difference in test scores 
between traditional classroom students (M = XX.X, SD = X.X) and online classroom 
students (M = XX.X, SD = X.X), t(38) = X.XX, p = .XXX, d = X.XX.

Common Pitfalls to Avoid

Don’t confuse statistical and practical significance: With n=20 per group, you have moderate power. A non-significant result doesn’t prove groups are identical—it may indicate insufficient power to detect a small effect.
Report effect size alongside p-value: The magnitude of difference (Cohen’s d) is as important as whether it’s significant.
Check assumptions: Don’t skip normality and equal variance checks. If violated, use the appropriate alternative.
Two-tailed vs. one-tailed: Use two-tailed unless you had a specific directional hypothesis BEFORE collecting data.
Interpretation context: A statistically significant 2-point difference may not be educationally meaningful, while a non-significant 8-point difference might warrant further investigation with larger samples.

Power Considerations

With n=20 per group, you have:

80% power to detect a large effect (d=0.8) at α=0.05
50% power to detect a medium effect (d=0.5)
Low power (<30%) to detect a small effect (d=0.2)

If you find a non-significant result, consider whether your sample size was adequate for the expected effect size.