Beyond Exchangeability: Distribution-Shift-Aware Integration of External Control Data in Randomized Trials

📅 2026-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of population distribution shift that arises when external control data are incorporated into randomized controlled trials due to cost constraints, which violates the conventional exchangeability assumption and biases causal effect estimation. To overcome this limitation, the authors propose a distribution-shift-aware semiparametric framework that explicitly models the distributional discrepancy between trial participants and external controls. By integrating calibration equations to adjust the efficient influence function and employing an adaptive shrinkage strategy, the method constructs an augmented estimator that maintains consistency while achieving higher statistical efficiency than estimators relying solely on trial data. Both theoretical analysis and empirical evaluations demonstrate that the proposed approach substantially improves estimation efficiency across synthetic and real-world scenarios, effectively relaxing the stringent exchangeability requirement.
📝 Abstract
Randomized controlled trials (RCTs) are the gold standard for evaluating causal effects but are often costly and difficult to scale; consequently, they are frequently augmented with auxiliary external controls in many applications. Prior approaches for borrowing such data typically rely on exchangeability, under which the external controls are readily usable for inference in the trial population. In practice, however, differences in eligibility criteria, standard of care, and data collection procedures may induce distribution shifts between the RCT and the external controls, rendering exchangeability implausible. In this paper, we propose a novel framework for integrating external controls by explicitly modeling these distribution shifts. We construct augmented estimators by adapting trial-only efficient influence functions through calibration equations that balance the trial and external populations, thereby fully exploiting the external control data even when exchangeability fails. We further develop an adaptive shrinkage estimator that preserves consistency while guaranteeing efficiency dominance over the trial-only benchmark. Synthetic experiments and a real data application demonstrate the practical advantages of the proposed approaches.
Problem

Research questions and friction points this paper is trying to address.

distribution shift
external control
randomized controlled trials
exchangeability
causal inference
Innovation

Methods, ideas, or system contributions that make the work stand out.

distribution shift
external control data
calibration equations
efficient influence function
adaptive shrinkage