🤖 AI Summary
This study addresses the critical problem of constructing prognostic covariates solely from within-trial data—without relying on external historical data—for causal effect adjustment in randomized trials. We formally formulate “within-trial prognostic score adjustment” as Targeted Maximum Likelihood Estimation (TMLE), proving it is equivalent to a TMLE implementation based on the efficient influence function. Through theoretical derivation, cross-validation, and extensive simulation studies, we demonstrate that the proposed estimator achieves efficiency identical to standard TMLE while eliminating dependence on external data. This work establishes, for the first time, the formal equivalence between within-trial prognostic modeling and TMLE, resolving long-standing methodological ambiguities in practice. It unifies the theoretical frameworks of prognostic adjustment and causal inference, providing a rigorous, efficient, and robust pathway for history-free causal estimation in randomized trials.
📝 Abstract
Adjustment for ``super'' or ``prognostic'' composite covariates has become more popular in randomized trials recently. These prognostic covariates are often constructed from historical data by fitting a predictive model of the outcome on the raw covariates. A natural question that we have been asked by applied researchers is whether this can be done without the historical data: can the prognostic covariate be constructed or derived from the trial data itself, possibly using different folds of the data, before adjusting for it? Here we clarify that such ``within-trial'' prognostic adjustment is nothing more than a form of targeted maximum likelihood estimation (TMLE), a well-studied procedure for optimal inference. We demonstrate the equivalence with a simulation study and discuss the pros and cons of within-trial prognostic adjustment (standard efficient estimation) relative to standard TMLE and standard prognostic adjustment with historical data.