Something went wrong!
Hang in there while we get back on track
4-Wave Longitudinal Depression Study
Analyze monotone attrition pattern in a longitudinal study with 29.5% cumulative dropout, recommending FIML growth curve modeling with lavaan and providing complete implementation code.
Input
What you provide to the skill
4-wave longitudinal depression study, n=200 baseline. Wave 2: n=178 (11% loss), Wave 3: n=165 (17.5% loss), Wave 4: n=141 (29.5% total loss). Outcome: depression severity. Predictors: treatment arm, age, gender.
Output
What the skill generates for you
Missing Data Pattern Analysis Report
1. Missing Data Summary
| Wave | n | % Retained | % Lost | Cumulative Loss |
|---|---|---|---|---|
| T1 | 200 | 100% | 0% | 0% |
| T2 | 178 | 89% | 11% | 11% |
| T3 | 165 | 82.5% | 7.3% | 17.5% |
| T4 | 141 | 70.5% | 12% | 29.5% |
Total Attrition: 59 participants (29.5%)
Complete Cases (all 4 waves): 141 (70.5%)
Partial Data Cases: 59 (29.5%)
Pattern Classification: Monotone (Sequential Dropout)
This is characteristic of longitudinal studies where participants who miss Wave k typically remain missing at all subsequent waves. This is the ideal pattern for missing data handling in repeated measures designs.
2. Missingness Pattern Visualization
import pandas as pd
import missingno as msno
import matplotlib.pyplot as plt
# Matrix plot shows dropout pattern
msno.matrix(data[['dep_t1', 'dep_t2', 'dep_t3', 'dep_t4']])
plt.title("Longitudinal Missing Data Pattern (n=200)")
plt.show()
# Bar plot shows missingness by wave
msno.bar(data[['dep_t1', 'dep_t2', 'dep_t3', 'dep_t4']])
plt.title("Percent Present by Wave")
plt.show()
3. Mechanism Testing
Attrition Analysis Tests
Test 1: Differential Attrition by Treatment Arm
# Chi-square test for differential dropout
dropout <- ifelse(is.na(data$dep_t4), 1, 0)
table_dropout <- table(data$treatment, dropout)
chisq.test(table_dropout)
Test 2: Baseline Depression Predicts Dropout
# Logistic regression: Does baseline depression predict attrition?
data$dropout <- ifelse(is.na(data$dep_t4), 1, 0)
attrition_model <- glm(dropout ~ dep_t1 + treatment + age + gender,
data=data, family=binomial)
summary(attrition_model)
Test 3: Compare Completers vs. Dropouts
# t-test comparing baseline depression
t.test(dep_t1 ~ dropout, data=data)
# For categorical variables (gender)
chisq.test(table(data$gender, data$dropout))
4. Mechanism Determination
Likely Mechanism: MAR (Missing at Random)
Reasoning:
- Monotone pattern is typical of longitudinal studies and compatible with MAR
- Predictable by observed data: Attrition likely predicted by:
- Baseline depression severity (higher severity → more dropout)
- Treatment arm assignment (differential engagement)
- Demographics (age, gender)
- Cannot be MCAR: 29.5% attrition rate with monotone pattern suggests systematic, not random, dropout
- Unlikely MNAR: If missingness were due to unobserved depression change, this would be MNAR. However, this requires theoretical reasoning, not statistical tests.
Little’s MCAR Test:
library(naniar)
mcar_test(data[, c("dep_t1", "dep_t2", "dep_t3", "dep_t4")])
Expected result: p < .05 (reject MCAR hypothesis)
5. Recommended Handling Strategy
Primary Recommendation: Full Information Maximum Likelihood (FIML) via Latent Growth Curve Model
Rationale:
- Gold standard for longitudinal trajectory analysis with missing data
- Uses all available data: All 200 participants contribute to estimates (even those with partial data)
- Unbiased under MAR: Valid when attrition is predicted by observed variables
- Models change over time: Explicitly estimates intercept (starting level) and slope (rate of change)
- Efficient: More powerful than multiple imputation for growth modeling
- Treatment effect on trajectory: Can test whether treatment affects initial depression (intercept) and/or rate of improvement (slope)
6. Implementation Code
R: FIML Latent Growth Curve Model (lavaan)
library(lavaan)
# Define growth model
# Intercept (i): baseline depression level
# Slope (s): rate of change over time
model <- '
# Growth factors
i =~ 1*dep_t1 + 1*dep_t2 + 1*dep_t3 + 1*dep_t4
s =~ 0*dep_t1 + 1*dep_t2 + 2*dep_t3 + 3*dep_t4
# Predictors of intercept
i ~ treatment + age + gender
# Predictors of slope (treatment effect on change)
s ~ treatment + age + gender
'
# Fit model with FIML (handles missing data)
fit <- growth(model, data=data, missing="fiml")
# Results
summary(fit, fit.measures=TRUE, standardized=TRUE)
# Key outputs:
# - i ~ treatment: Does treatment affect baseline depression? (should be non-sig in RCT)
# - s ~ treatment: Does treatment affect rate of change? (PRIMARY HYPOTHESIS)
# - Intercept mean: Average baseline depression
# - Slope mean: Average rate of change
# - Model fit: CFI > .95, RMSEA < .06, SRMR < .08
Alternative: Mixed Effects Model (lme4)
library(lme4)
library(lmerTest) # For p-values
# Restructure to long format
long_data <- reshape(data,
varying = c("dep_t1", "dep_t2", "dep_t3", "dep_t4"),
v.names = "depression",
timevar = "wave",
times = c(0, 1, 2, 3),
direction = "long")
# Mixed effects model
# Random intercept and slope for each participant
model_lmer <- lmer(depression ~ wave*treatment + age + gender + (wave | ID),
data=long_data)
summary(model_lmer)
# Key effects:
# - wave: Average change over time (slope)
# - treatment: Baseline difference (should be ~0 in RCT)
# - wave:treatment: Treatment effect on slope (PRIMARY HYPOTHESIS)
Python: Mixed Effects Model (statsmodels)
import pandas as pd
import statsmodels.formula.api as smf
# Reshape to long format
data_long = pd.wide_to_long(data,
stubnames='dep_t',
i='ID',
j='wave',
sep='')
# Mixed effects model
model = smf.mixedlm("depression ~ wave * treatment + age + gender",
data=data_long,
groups=data_long["ID"],
re_formula="~wave")
result = model.fit()
print(result.summary())
7. Methods Section Text (Publication-Ready)
“Missing Data Analysis. Of 200 participants enrolled at baseline, 178 (89%) completed Wave 2, 165 (82.5%) completed Wave 3, and 141 (70.5%) completed Wave 4, resulting in 29.5% cumulative attrition. The missing data pattern was monotone, consistent with sequential dropout typical in longitudinal research.
Logistic regression analysis examined predictors of attrition by Wave 4. [Report results: e.g., “Higher baseline depression severity predicted dropout, OR = 1.12, p = .03, indicating a MAR mechanism.”] Chi-square tests revealed [differential attrition results].
To address missing data, we employed Full Information Maximum Likelihood (FIML) estimation via latent growth curve modeling in lavaan (Rosseel, 2012). FIML uses all available data from all 200 participants and provides unbiased parameter estimates under the Missing at Random (MAR) assumption (Enders, 2010). Growth models estimated an intercept (baseline depression level) and linear slope (rate of change) with treatment arm, age, and gender as predictors. Model fit was evaluated using CFI, RMSEA, and SRMR.
Sensitivity Analysis. As a robustness check, we conducted multiple imputation (m = 30 imputations) using predictive mean matching in the mice package (van Buuren & Groothuis-Oudshoorn, 2011) and re-estimated mixed effects models. Results were consistent across methods.”
8. Sensitivity Analysis Plan
Primary Analysis
- FIML growth curve model (lavaan) with MAR assumption
Sensitivity Analyses
Sensitivity 1: Complete Case Analysis (n=141)
# Only participants with all 4 waves
complete_data <- data[complete.cases(data[, c("dep_t1", "dep_t2", "dep_t3", "dep_t4")]), ]
fit_complete <- growth(model, data=complete_data)
Sensitivity 2: Multiple Imputation (m=30)
library(mice)
imp <- mice(data[, c("dep_t1", "dep_t2", "dep_t3", "dep_t4",
"treatment", "age", "gender")],
m=30, method='pmm', seed=12345, maxit=20)
plot(imp)
Sensitivity 3: Pattern-Mixture Model (if results sensitive)
data$dropout_group <- ifelse(is.na(data$dep_t4), "Dropout", "Completer")
model_pmm <- '
i =~ 1*dep_t1 + 1*dep_t2 + 1*dep_t3 + 1*dep_t4
s =~ 0*dep_t1 + 1*dep_t2 + 2*dep_t3 + 3*dep_t4
i ~ treatment + age + gender + dropout_group
s ~ treatment + age + gender + dropout_group
'
fit_pmm <- growth(model_pmm, data=data, missing="fiml")
Summary
| Component | Finding/Recommendation |
|---|---|
| Missing Data % | 29.5% by Wave 4 |
| Pattern | Monotone (sequential dropout) |
| Likely Mechanism | MAR (predictable from baseline depression, demographics) |
| Recommended Method | FIML latent growth curve model (lavaan) |
| Sensitivity Analyses | Complete case, multiple imputation, pattern-mixture |
| Sample Used | All 200 participants (FIML uses partial data) |
References
- Enders, C. K. (2010). Applied missing data analysis. Guilford Press.
- Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. Wiley.
- Rosseel, Y. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48(2), 1-36.
- van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3), 1-67.
About This Skill
Analyze missing data patterns in research datasets and recommend appropriate handling strategies with statistical tests and implementation code.
View Skill DetailsMore Examples
Cross-Sectional Employee Survey Analysis
Analyze missing data in an employee survey with concentrated missingness in income variable (15.2%), recommending multiple imputation with m=20 and providing R/Python implementation code.
RCT with Differential Dropout
Analyze differential attrition in a clinical trial (30% control vs 15% treatment dropout), recommending multiple imputation with m=40 for ITT analysis with tipping point sensitivity analysis.