Estimating the average treatment effect in cluster-randomized trials with misclassified outcomes and non-random validation subsets

📅 2025-08-25

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

In cluster-randomized trials, conventional estimators of the average treatment effect (ATE) suffer from bias when outcomes are subject to measurement error (e.g., misclassification in clinical records) and validation subsets are non-randomly selected (e.g., only a subset of parents complete surveys), inducing both misclassification and selection bias. This paper proposes a causal inference framework that jointly leverages silver-standard data (electronic health records) and gold-standard data (parent-reported outcomes), explicitly modeling their dependence while accommodating selection mechanisms that depend on covariates, treatment assignment, and cluster structure—and allowing for heterogeneous treatment effects. The method employs a joint probability model fitted on the validation subset, combined with clustering-robust estimation and covariate adjustment. Simulation studies demonstrate strong consistency and precision even in small samples. Applied to the ASPIRE trial, the approach effectively corrects for both outcome misclassification and selection bias, yielding more accurate and reliable ATE estimates.

Technology Category

Application Category

📝 Abstract

Randomized trials are viewed as the benchmark for assessing causal effects of treatments on outcomes of interest. Nonetheless, challenges such as measurement error can undermine the standard causal assumptions for randomized trials. In ASPIRE, a cluster-randomized trial, pediatric primary care clinics were assigned to one of two treatments aimed at promoting clinician delivery of a secure firearm program to parents during well-child visits. A key outcome of interest is thus parent receipt of the program at each visit. Clinicians documented program delivery in patients' electronic health records for all visits, but their reporting is a proxy measure for the parent receipt outcome. Parents were also surveyed to report directly on program receipt after their child's visit; however, only a small subset of them completed the survey. Here, we develop a causal inference framework for a binary outcome that is subject to misclassification through silver-standard measures (clinician reports), but gold-standard measures (parent reports) are only available for a non-random internal validation subset. We propose a method for identifying the average treatment effect (ATE) that addresses the risks of misclassification and selection bias, even when the outcome (parent receipt) may directly impact selection propensity (survey responsiveness). We show that ATE estimation relies on specifying the relationship between the gold- and silver-standard outcome measures in the validation subset, which may depend on treatment and covariates. Additionally, the clustered design is reflected in our causal assumptions and in our cluster-robust approach to estimation of the ATE. Simulation studies demonstrate acceptable finite-sample operating characteristics of our ATE estimator, supporting its application to ASPIRE.

Problem

Research questions and friction points this paper is trying to address.

Estimating treatment effects with misclassified binary outcomes

Addressing selection bias in non-random validation subsets

Correcting measurement error in cluster-randomized trial designs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses silver-standard clinician reports with gold-standard parent validation

Addresses misclassification and selection bias in clustered trials

Estimates treatment effect with cluster-robust causal inference framework

🔎 Similar Papers

Identifying treatment response subgroups in observational time-to-event data