All examples for Statistical Test Selection Guide

Regional Customer Satisfaction Analysis

Shows one-way ANOVA for comparing satisfaction scores across 4 regions, including post-hoc tests, effect size, and both business and academic reporting formats.

Input

What you provide to the skill

We tested customer satisfaction (1-10 scale) across 4 regions: North (n=35), South (n=32), East (n=38), West (n=36). Same survey, different people. Do regions differ significantly?

Output

What the skill generates for you

Recommended Test: One-Way Between-Subjects ANOVA

Why This Test?

Based on your description:

  • ✓ ONE continuous dependent variable (satisfaction scores 1-10)
  • ✓ ONE categorical independent variable (region) with FOUR levels (North, South, East, West)
  • ✓ Independent groups (different people in each region)
  • ✓ Adequate sample sizes (n=32-38 per group, total N=141)

Assumptions to Check

  1. Normality: Satisfaction scores approximately normal within each region

    • Check: Shapiro-Wilk test, Q-Q plots
    • ANOVA is robust to mild violations with balanced groups
    • If violated: Kruskal-Wallis test
  2. Homogeneity of Variance: Similar variance across regions

    • Check: Levene’s test
    • If violated: Welch’s ANOVA
  3. Independence: Each satisfaction score independent of others (satisfied by different people)

Python Code

from scipy.stats import f_oneway, levene, shapiro
import numpy as np
# Your data
north = np.array(<your_data>) # 35 satisfaction scores (1-10)
south = np.array(<your_data>) # 32 satisfaction scores (1-10)
east = np.array(<your_data>) # 38 satisfaction scores (1-10)
west = np.array(<your_data>) # 36 satisfaction scores (1-10)
# Descriptive statistics
print(f"North (n=35): M = {np.mean(north):.2f}, SD = {np.std(north, ddof=1):.2f}")
print(f"South (n=32): M = {np.mean(south):.2f}, SD = {np.std(south, ddof=1):.2f}")
print(f"East (n=38): M = {np.mean(east):.2f}, SD = {np.std(east, ddof=1):.2f}")
print(f"West (n=36): M = {np.mean(west):.2f}, SD = {np.std(west, ddof=1):.2f}")
# Check normality for each region
for name, group in [("North", north), ("South", south), ("East", east), ("West", west)]:
stat, p = shapiro(group)
print(f"{name} normality: p = {p:.4f} {'(normal)' if p > 0.05 else '(non-normal)'}")
# Check equal variance (homogeneity of variance)
stat, p_levene = levene(north, south, east, west)
print(f"\nLevene's test for equal variance: p = {p_levene:.4f}")
if p_levene > 0.05:
print("Assumption of equal variance satisfied")
else:
print("WARNING: Unequal variances detected - consider Welch's ANOVA")
# Run one-way ANOVA
F_stat, p_value = f_oneway(north, south, east, west)
print(f"\n=== ANOVA Results ===")
print(f"F-statistic: F(3, 137) = {F_stat:.3f}")
print(f"P-value: {p_value:.4f}")
if p_value < 0.05:
print("\n✓ SIGNIFICANT: Regions differ significantly in customer satisfaction")
print(" → Proceed to post-hoc tests to identify which regions differ")
else:
print("\n✗ NOT SIGNIFICANT: No evidence of regional differences in satisfaction")

Effect Size

# Eta-squared (proportion of variance explained by region)
all_scores = np.concatenate([north, south, east, west])
grand_mean = np.mean(all_scores)
# Sum of squares between groups
ss_between = sum([len(g) * (np.mean(g) - grand_mean)**2
for g in [north, south, east, west]])
# Total sum of squares
ss_total = sum([(x - grand_mean)**2 for x in all_scores])
eta_squared = ss_between / ss_total
print(f"\nEta-squared = {eta_squared:.3f}")
# Interpretation
if eta_squared < 0.01:
print("Small effect (< 1% of variance explained)")
elif eta_squared < 0.06:
print("Medium effect (1-6% of variance explained)")
else:
print("Large effect (> 6% of variance explained)")

Post-Hoc Tests (If Significant)

If the ANOVA is significant, run Tukey’s HSD to identify which specific regions differ:

from scipy.stats import tukey_hsd
# Run Tukey's HSD post-hoc test
res = tukey_hsd(north, south, east, west)
print("\n=== Tukey's HSD Post-Hoc Comparisons ===")
print(res)
# Manual pairwise comparisons with labels
import itertools
from scipy.stats import ttest_ind
regions = {'North': north, 'South': south, 'East': east, 'West': west}
comparisons = list(itertools.combinations(regions.keys(), 2))
print("\nPairwise comparisons:")
for reg1, reg2 in comparisons:
t_stat, p_val = ttest_ind(regions[reg1], regions[reg2])
diff = np.mean(regions[reg1]) - np.mean(regions[reg2])
print(f"{reg1} vs {reg2}: Diff = {diff:+.2f}, p = {p_val:.4f} {'*' if p_val < 0.05 else ''}")
print("\nNote: Apply Bonferroni correction (α = 0.05/6 = 0.0083) for multiple comparisons")

Non-Parametric Alternative (If Assumptions Violated)

from scipy.stats import kruskal
# Kruskal-Wallis H-test (non-parametric alternative)
H_stat, p_value = kruskal(north, south, east, west)
print(f"\nKruskal-Wallis H-test:")
print(f"H-statistic: {H_stat:.3f}")
print(f"P-value: {p_value:.4f}")
if p_value < 0.05:
print("Significant difference between regions (non-parametric test)")

Business Reporting Template

Executive Summary:

We analyzed customer satisfaction scores (1-10 scale) across four regions: North (n=35), South (n=32), East (n=38), and West (n=36).

Key Finding:

  • [If p < 0.05]: Regional differences in satisfaction are statistically significant (p = .XXX, F(3,137) = X.XX).

    • Effect size (eta-squared = .XX) indicates that region accounts for X% of satisfaction variance.
    • Highest satisfaction: [Region] (M = X.XX)
    • Lowest satisfaction: [Region] (M = X.XX)
    • Actionable insight: Investigate practices in top-performing region for potential rollout.
  • [If p > 0.05]: No statistically significant regional differences detected (p = .XXX).

    • All regions show similar satisfaction levels (overall M = X.XX).
    • Actionable insight: Regional strategy can be standardized; differences are likely due to chance.

Academic Reporting Template (APA Format)

“A one-way between-subjects ANOVA was conducted to compare customer satisfaction scores across four geographic regions (North, South, East, West). The analysis revealed [a significant/no significant] effect of region on satisfaction scores, F(3, 137) = X.XX, p = .XXX, η² = .XX. [If significant: Post-hoc comparisons using Tukey’s HSD indicated that…]”

Common Pitfalls to Avoid

  1. Don’t run six separate t-tests (North vs South, North vs East, etc.)

    • This inflates Type I error rate to ~26% instead of 5%
    • Use ANOVA first, then post-hoc tests if significant
  2. Don’t ignore effect size

    • p < 0.05 tells you differences are real, not random
    • Eta-squared tells you if differences are meaningful
    • With large samples, tiny differences can be “significant” but unimportant
  3. Don’t confuse statistical and practical significance

    • A difference of 0.3 points on a 10-point scale might be statistically significant but practically irrelevant
    • Consider: What size difference would actually change business decisions?
  4. Don’t forget assumption checks

    • ANOVA is fairly robust, but severely violated assumptions require alternatives
    • With nearly equal sample sizes (32-38), homogeneity violations are less concerning

Next Steps

  1. Run assumption checks first (normality, equal variance)
  2. Execute ANOVA with your actual data
  3. If significant: Run Tukey’s HSD to identify which regions differ
  4. Calculate effect size to assess practical importance
  5. Interpret in business context: What would you do differently based on results?