Modern causal inference approaches to improve power for subgroup analysis in randomized controlled trials

📅 2025-05-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the low statistical power in randomized controlled trial (RCT) subgroup analyses due to small sample sizes, this paper proposes a novel method for enhancing conditional average treatment effect (CATE) estimation by integrating external RCT or observational study data. Methodologically, it introduces a doubly robust estimator that synergistically combines calibrated double machine learning (DML) with covariate balancing—thereby mitigating inverse probability weighting explosion and inference bias arising from extremely low propensity scores in external data. The approach further integrates automatic debiased DML, Bayesian nonparametric modeling, and covariate balancing. Simulation studies demonstrate substantial gains in statistical power for detecting treatment effect heterogeneity. Empirically, the method is applied to two RCTs and one observational study to evaluate the efficacy of citalopram in subgroups of first-episode schizophrenia patients, yielding robust and interpretable CATE estimates.

Technology Category

Application Category

📝 Abstract
In randomized controlled trials (RCTs), subgroup analyses are often planned to evaluate the heterogeneity of treatment effects within pre-specified subgroups of interest. However, these analyses frequently have small sample sizes, reducing the power to detect heterogeneous effects. A way to increase power is by borrowing external data from similar RCTs or observational studies. In this project, we target the conditional average treatment effect (CATE) as the estimand of interest, provide identification assumptions, and propose a doubly robust estimator that uses machine learning and Bayesian nonparametric techniques. Borrowing data, however, may present the additional challenge of practical violations of the positivity assumption, the conditional probability of receiving treatment in the external data source may be small, leading to large inverse weights and erroneous inferences, thus negating the potential power gains from borrowing external data. To overcome this challenge, we also propose a covariate balancing approach, an automated debiased machine learning (DML) estimator, and a calibrated DML estimator. We show improved power in various simulations and offer practical recommendations for the application of the proposed methods. Finally, we apply them to evaluate the effectiveness of citalopram, a drug commonly used to treat depression, for negative symptoms in first-episode schizophrenia patients across subgroups defined by duration of untreated psychosis, using data from two RCTs and an observational study.
Problem

Research questions and friction points this paper is trying to address.

Enhancing power for subgroup analysis in RCTs using external data
Addressing practical violations of positivity assumption in data borrowing
Proposing robust estimators for conditional average treatment effects
Innovation

Methods, ideas, or system contributions that make the work stand out.

Doubly robust estimator with machine learning
Covariate balancing for positivity violations
Calibrated debiased machine learning estimator
🔎 Similar Papers
No similar papers found.
A
Antonio D'Alessandro
Division of Biostatistics, Department of Population Health, New York University School of Medicine, New York, NY, 10016
J
Jiyu Kim
Division of Biostatistics, Department of Population Health, New York University School of Medicine, New York, NY, 10016
Samrachana Adhikari
Samrachana Adhikari
NYU School of Medicine
StatisticsCausal InferenceSocial network analysis
D
Donald Goff
Department of Psychiatry, New York University School of Medicine, New York, NY, 10016
F
Falco Bargagli Stoffi
Department of Biostatistics, University of California, Los Angeles, CA, 90095
Michele Santacatterina
Michele Santacatterina
NYU Grossman School of Medicine
BiostatisticsCausal InferenceData ScienceHealthcareReal-World Data