Modern causal inference approaches to improve power for subgroup analysis in randomized controlled trials

📅 2025-05-13

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

To address the low statistical power in randomized controlled trial (RCT) subgroup analyses due to small sample sizes, this paper proposes a novel method for enhancing conditional average treatment effect (CATE) estimation by integrating external RCT or observational study data. Methodologically, it introduces a doubly robust estimator that synergistically combines calibrated double machine learning (DML) with covariate balancing—thereby mitigating inverse probability weighting explosion and inference bias arising from extremely low propensity scores in external data. The approach further integrates automatic debiased DML, Bayesian nonparametric modeling, and covariate balancing. Simulation studies demonstrate substantial gains in statistical power for detecting treatment effect heterogeneity. Empirically, the method is applied to two RCTs and one observational study to evaluate the efficacy of citalopram in subgroups of first-episode schizophrenia patients, yielding robust and interpretable CATE estimates.

Technology Category

Application Category

📝 Abstract

In randomized controlled trials (RCTs), subgroup analyses are often planned to evaluate the heterogeneity of treatment effects within pre-specified subgroups of interest. However, these analyses frequently have small sample sizes, reducing the power to detect heterogeneous effects. A way to increase power is by borrowing external data from similar RCTs or observational studies. In this project, we target the conditional average treatment effect (CATE) as the estimand of interest, provide identification assumptions, and propose a doubly robust estimator that uses machine learning and Bayesian nonparametric techniques. Borrowing data, however, may present the additional challenge of practical violations of the positivity assumption, the conditional probability of receiving treatment in the external data source may be small, leading to large inverse weights and erroneous inferences, thus negating the potential power gains from borrowing external data. To overcome this challenge, we also propose a covariate balancing approach, an automated debiased machine learning (DML) estimator, and a calibrated DML estimator. We show improved power in various simulations and offer practical recommendations for the application of the proposed methods. Finally, we apply them to evaluate the effectiveness of citalopram, a drug commonly used to treat depression, for negative symptoms in first-episode schizophrenia patients across subgroups defined by duration of untreated psychosis, using data from two RCTs and an observational study.

Problem

Research questions and friction points this paper is trying to address.

Enhancing power for subgroup analysis in RCTs using external data

Addressing practical violations of positivity assumption in data borrowing

Proposing robust estimators for conditional average treatment effects

Innovation

Methods, ideas, or system contributions that make the work stand out.

Doubly robust estimator with machine learning

Covariate balancing for positivity violations

Calibrated debiased machine learning estimator

🔎 Similar Papers

Targeting Relative Risk Heterogeneity with Causal Forests

2023-09-26arXiv.orgCitations: 1

Identifying treatment response subgroups in observational time-to-event data

2024-08-06arXiv.orgCitations: 0

Microsoft

$6,710 -

San Francisco Bay area / New York City metropolitan area

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)