Leveraging machine learning to estimate individualized treatment effects in cluster-randomized trials

📅 2026-04-14
📈 Citations: 0
Influential: 0
📄 PDF

career value

203K/year
🤖 AI Summary
Traditional cluster randomized trials typically estimate only the average treatment effect, overlooking heterogeneity at both individual and cluster levels. This work proposes a unified mixed-effects machine learning framework that integrates individual- and cluster-level covariates to estimate conditional average treatment effects while marginalizing over unobserved cluster heterogeneity. For the first time, it systematically combines methods such as Bayesian additive regression trees, multilevel Bayesian causal forests, mixed-effects random forests, mixed-effects gradient boosting, and generalized additive mixed models, incorporating cluster-specific random intercepts to account for within-cluster dependence. The approach is validated through diverse simulation studies and an application to a cluster randomized trial on hypertension management in Ghana, with accompanying open-source code and practical implementation guidelines provided.

Technology Category

Application Category

📝 Abstract
Cluster-randomized trials (CRTs) are widely used to evaluate interventions delivered at the clinic, practice, or community level. Although standard analyses typically target average treatment effects, such summaries mask potentially meaningful variation in treatment response across individuals and clusters. This work addresses the estimation of conditional average treatment effects (CATEs) for continuous outcomes in two-arm parallel CRTs by defining causal estimands that incorporate both individual- and cluster-level baseline covariates while marginalizing over unobserved cluster heterogeneity. To estimate these quantities, we develop a unified framework based on mixed-effects machine learning, integrating and extending a range of existing approaches, including Bayesian additive regression trees with random effects, multilevel Bayesian causal forests, mixed-effects random forests, several mixed-effects gradient boosting procedures, and generalized additive mixed models, while incorporating cluster-specific random intercepts to account for within-cluster dependence. We evaluate these methods across diverse simulation scenarios and demonstrate their use in the Task Shifting and Blood Pressure Control in Ghana CRT, which investigates strategies for improving hypertension management. Drawing on these investigations, we provide practical guidance for applying mixed-effects machine learning to quantify treatment-effect heterogeneity in CRTs, together with reproducible code that enables investigators to implement all methods within a coherent workflow.
Problem

Research questions and friction points this paper is trying to address.

cluster-randomized trials
individualized treatment effects
conditional average treatment effects
treatment-effect heterogeneity
mixed-effects models
Innovation

Methods, ideas, or system contributions that make the work stand out.

mixed-effects machine learning
conditional average treatment effects
cluster-randomized trials
treatment-effect heterogeneity
causal inference
🔎 Similar Papers
No similar papers found.
C
Changjun Li
Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
Xi Fang
Xi Fang
Yale University
NeuroscienceMachine LearningNutritionCognitionNeuroimaging
M
Michael O. Harhay
Center for Clinical Trials Innovation, Department of Biostatistics, Epidemiology & Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
A
Andrew B. Forbes
Division of Quantitative Research Methodology, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC, Australia
F
F. Perry Wilson
Section of Nephrology, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
G
Guangyu Tong
Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
Fan Li
Fan Li
Department of Statistical Science, Duke University
statisticscausal inferencecomparative effectiveness researchmissing dataBayesian