Correcting for Missing Data When Evaluating Surrogate Markers in a Clinical Trial

📅 2026-03-21

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Existing approaches to surrogate marker evaluation often ignore missing data and rely on complete-case analyses, which can introduce bias and reduce statistical efficiency. This work proposes a unified framework that, for the first time, systematically incorporates missing data correction into surrogate marker validation by integrating inverse probability weighting (IPW) with semiparametric maximum likelihood estimation (SMLE). The framework ensures robustness, computational tractability, and high statistical efficiency under both nonparametric and parametric settings. An accompanying R package, MissSurrogate, implements the proposed methodology. Simulation studies demonstrate that the method remains unbiased and achieves near-full-sample efficiency across various missing data mechanisms. Its practical utility is further illustrated through an application to a diabetes clinical trial.

Technology Category

Application Category

📝 Abstract

Evaluating treatment effects is critical in clinical trials but sometimes involves lengthy, invasive, or costly follow-up procedures. In these cases, surrogate markers, which provide intermediate measures of the long-term treatment effect, allow clinicians to obtain results faster and more efficiently than would have otherwise been possible. Prior to adoption, it is vital that the utility of surrogate markers (i.e., their ability to capture the treatment effect on the primary outcome) is statistically validated. Many frameworks for evaluating surrogate markers have been proposed, but they do not account for missing data. Instead, they rely on complete cases (the subset of patients without missing data), which can be inefficient and biased. To improve on this, we propose methods to accommodate missing data in nonparametric and parametric surrogate evaluation via inverse probability weighting (IPW) and semiparametric maximum likelihood estimation (SMLE). Through simulation studies, we demonstrate that the proposed methods remain unbiased under a broader range of missing data mechanisms than complete case analysis and can help retain the statistical precision of the full trial. We illustrate their practical utility through an application to a diabetes clinical trial. Moreover, our missing data corrections have complementary strengths with respect to computational ease, robustness, and statistical efficiency. All methods are implemented in the MissSurrogate R package.

Problem

Research questions and friction points this paper is trying to address.

surrogate markers

missing data

clinical trials

treatment effect

statistical validation

Innovation

Methods, ideas, or system contributions that make the work stand out.

surrogate markers

missing data

inverse probability weighting