Missing Data Pattern Analysis: RCT Depression Trial

1. Missing Data Summary

Variable	Control Arm (n=150)	Treatment Arm (n=150)	Total (n=300)
Baseline depression	0 (0%)	0 (0%)	0 (0%)
Week 12 depression	45 (30%)	23 (15%)	68 (22.7%)
Anxiety (baseline)	0 (0%)	0 (0%)	0 (0%)

Completers: 232/300 (77.3%)
Pattern: Monotone (sequential dropout at Week 12)
Differential Attrition: χ²(1) = 10.23, p = 0.001

2. Pattern Classification

Monotone Missing Data Pattern

All participants have baseline data
Attrition occurs at Week 12 endpoint
No missingness prior to dropout (sequential)
Common in clinical trials with protocol discontinuation

3. Mechanism Assessment

Differential Attrition Analysis

# Test for differential dropout by treatment arm
attrition_table <- table(data$treatment, data$week12_missing)
chisq.test(attrition_table)
# χ²(1) = 10.23, p = 0.001

Result: Significantly higher dropout in control (30%) vs. treatment (15%).

Predictors of Missingness

Control Arm:

model_control <- glm(dropout ~ baseline_depression + baseline_anxiety,
                    data = subset(data, treatment == "control"),
                    family = binomial)

Results:

Baseline depression: OR = 1.02, p = 0.45 (not significant)
Baseline anxiety: OR = 1.18, p = 0.03 (significant predictor)

Treatment Arm:

model_treatment <- glm(dropout ~ baseline_depression + baseline_anxiety,
                      data = subset(data, treatment == "treatment"),
                      family = binomial)

Results:

Baseline depression: OR = 1.01, p = 0.52
Baseline anxiety: OR = 1.05, p = 0.32 (not significant)

Baseline Comparability

Between-Group at Baseline:

t.test(baseline_depression ~ treatment, data = data)
# t = 1.35, df = 298, p = 0.18 (groups similar)

4. Mechanism Determination

Mechanism: MAR (Missing at Random)

Reasoning:

Not MCAR: Differential attrition by treatment arm (30% vs. 15%) indicates non-random missingness
MAR Evidence:
- Missingness predicted by observed baseline anxiety (in control arm)
- Baseline depression similar across groups (p = 0.18)
- Differential dropout likely due to treatment effectiveness (observable)
- Higher control dropout suggests lack of perceived benefit
Not MNAR: No evidence that missing outcome values themselves drive dropout

Clinical Interpretation: Control participants with higher anxiety dropped out more frequently, possibly due to lack of symptom improvement. Treatment reduces dropout by providing benefit. This is MAR because missingness depends on observed characteristics.

5. Recommended Handling Strategy

Primary Analysis: Multiple Imputation (Intent-to-Treat)

Rationale:

22.7% missing exceeds 5% threshold - Complete case analysis would lose power and bias
Differential attrition requires ITT - Per-protocol analysis would bias treatment effect
MAR mechanism - MI provides valid estimates under MAR
Preserves randomization - All 300 participants retained in analysis
Regulatory standard - FDA/EMA recommend MI for ITT in trials with >5% dropout

Specification:

Number of imputations: m = 40 (for 23% missing, use m >= 100 x % missing)
Imputation model: Include treatment, baseline depression, baseline anxiety, and auxiliary variables
Method: Predictive mean matching (PMM) for continuous outcomes
Analysis: Pool results using Rubin’s rules

6. Implementation Code

R Implementation (mice package)

library(mice)
library(tidyverse)

# Prepare data
data <- data %>%
  mutate(
    dropout = is.na(week12_depression),
    treatment_num = ifelse(treatment == "treatment", 1, 0)
  )

# Multiple Imputation
set.seed(2025)
imp <- mice(
  data %>% select(treatment_num, baseline_depression, baseline_anxiety, week12_depression),
  m = 40,
  method = "pmm",
  maxit = 20,
  printFlag = FALSE
)

# Check convergence
plot(imp)
densityplot(imp)

# Primary ITT Analysis: Treatment effect on Week 12 depression
fit <- with(imp, lm(week12_depression ~ treatment_num + baseline_depression + baseline_anxiety))

# Pool results
pooled <- pool(fit)
summary(pooled, conf.int = TRUE)

# Extract treatment effect
treatment_effect <- summary(pooled)[2, ]
cat(sprintf("Treatment Effect: beta = %.2f, 95%% CI [%.2f, %.2f], p = %.3f",
            treatment_effect$estimate,
            treatment_effect$`2.5 %`,
            treatment_effect$`97.5 %`,
            treatment_effect$p.value))

Python Implementation (sklearn)

import pandas as pd
import numpy as np
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer
from sklearn.linear_model import LinearRegression
import statsmodels.api as sm

# Load data
data = pd.read_csv('trial_data.csv')

# Prepare for imputation
X_impute = data[['treatment', 'baseline_depression', 'baseline_anxiety', 'week12_depression']]
X_impute['treatment'] = X_impute['treatment'].map({'control': 0, 'treatment': 1})

# Iterative Imputation
imputer = IterativeImputer(
    max_iter=20,
    random_state=2025,
    estimator=LinearRegression()
)

imputed_array = imputer.fit_transform(X_impute)
imputed_df = pd.DataFrame(imputed_array, columns=X_impute.columns)

# ITT Analysis
X = sm.add_constant(imputed_df[['treatment', 'baseline_depression', 'baseline_anxiety']])
y = imputed_df['week12_depression']

model = sm.OLS(y, X).fit()
print(model.summary())

# Note: For true multiple imputation with proper pooling, use R mice package

7. Sensitivity Analyses

Required Robustness Checks

Complete Case Analysis (Per-Protocol)

cc_model <- lm(week12_depression ~ treatment_num + baseline_depression + baseline_anxiety,
               data = data %>% filter(!dropout))
summary(cc_model)

Expected: Likely overestimates treatment effect (worse patients dropped from control)

LOCF (Last Observation Carried Forward)

data_locf <- data %>%
  mutate(week12_depression_locf = ifelse(is.na(week12_depression),
                                          baseline_depression,
                                          week12_depression))
locf_model <- lm(week12_depression_locf ~ treatment_num + baseline_depression + baseline_anxiety,
                 data = data_locf)

Expected: Conservative estimate (assumes no change for dropouts)

Tipping Point Analysis

# Test how much worse dropouts would need to be for treatment effect to disappear
for(delta in seq(0, 10, by = 0.5)) {
  # Add delta to control arm imputations
  # Subtract delta from treatment arm imputations
  # Re-run analysis
  # Record when p > 0.05
}

Pattern-Mixture Model

pmm_model <- lm(week12_depression ~ treatment_num * dropout + baseline_depression + baseline_anxiety,
                data = data)
# Tests if treatment effect differs between completers and dropouts

8. Publication-Ready Methods Section

“This randomized controlled trial enrolled 300 participants (n=150 per arm) to evaluate treatment efficacy on depression symptoms over 12 weeks. Missing data analysis revealed differential attrition: 30% dropout in the control arm versus 15% in the treatment arm (χ²(1) = 10.23, p = 0.001). Baseline depression scores were similar between groups (p = 0.18). Logistic regression analysis indicated that baseline anxiety predicted dropout in the control arm only (OR = 1.18, p = 0.03), suggesting a missing-at-random (MAR) mechanism related to treatment response.

To preserve the intent-to-treat principle and address 22.7% missing outcome data, multiple imputation was conducted using the mice package (van Buuren & Groothuis-Oudshoorn, 2011) in R version 4.x. Forty imputed datasets were generated using predictive mean matching, incorporating treatment assignment, baseline depression, and baseline anxiety as predictors. Treatment effects were estimated using linear regression on each imputed dataset and pooled using Rubin’s rules (Rubin, 1987).

Sensitivity analyses included complete case analysis (n=232), last observation carried forward, and tipping point analysis to assess robustness of findings under varying missing data assumptions. Primary results are reported from the multiple imputation analysis; sensitivity analyses yielded [similar/different] conclusions.”

9. CONSORT Flow Diagram Data

Randomized (n=300)
├── Control Arm (n=150)
│   ├── Completed Week 12 (n=105, 70%)
│   └── Dropout (n=45, 30%)
│       └── Anxiety predicted dropout (OR=1.18, p=0.03)
└── Treatment Arm (n=150)
    ├── Completed Week 12 (n=127, 85%)
    └── Dropout (n=23, 15%)

Analyzed (ITT with MI)
├── Control: n=150
└── Treatment: n=150

Per-Protocol (Sensitivity)
├── Control: n=105
└── Treatment: n=127

10. Summary

Component	Finding/Recommendation
Missing Data %	22.7% (differential: 30% control, 15% treatment)
Pattern	Monotone (endpoint dropout)
Mechanism	MAR (anxiety predicts control dropout)
Primary Method	Multiple Imputation (m=40) for ITT
Sensitivity	Complete case, LOCF, tipping point, pattern-mixture
Regulatory	FDA/EMA compliant approach

Strengths of MI Approach:

Retains all 300 participants (ITT-compliant)
Unbiased estimates under MAR assumption
Uncertainty properly captured in standard errors
Regulatory-accepted method

Limitations:

Cannot rule out MNAR (e.g., severe depression causing dropout)
Sensitivity to imputation model specification
Results should be compared across multiple methods

RCT with Differential Dropout

Input

Output