Assessing Surrogate Heterogeneity in Real World Data Using Meta-Learners

📅 2025-04-21

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Current methods for evaluating surrogate endpoint heterogeneity in real-world non-randomized data lack validity due to their reliance on untestable randomization assumptions. Method: We propose the first individualized surrogate strength quantification framework that does not require randomization assumptions. It integrates meta-learners (e.g., S-, T-, and X-learners) with flexible machine learning models to simultaneously adjust for confounding bias and model patient-level heterogeneity in surrogate effects. Contribution/Results: Our approach enables subgroup-specific estimation of surrogate effects in non-randomized settings—marking the first such method—and supports precision-tiered validation of surrogate endpoint validity. Simulation studies and empirical analysis using HbA1c as a surrogate for fasting plasma glucose (FPG) demonstrate its ability to accurately identify heterogeneous effect subgroups, significantly enhancing both reliability and interpretability of real-world surrogate validation.

Technology Category

Application Category

📝 Abstract

Surrogate markers are most commonly studied within the context of randomized clinical trials. However, the need for alternative outcomes extends beyond these settings and may be more pronounced in real-world public health and social science research, where randomized trials are often impractical. Research on identifying surrogates in real-world non-randomized data is scarce, as available statistical approaches for evaluating surrogate markers tend to rely on the assumption that treatment is randomized. While the few methods that allow for non-randomized treatment/exposure appropriately handle confounding individual characteristics, they do not offer a way to examine surrogate heterogeneity with respect to patient characteristics. In this paper, we propose a framework to assess surrogate heterogeneity in real-world, i.e., non-randomized, data and implement this framework using various meta-learners. Our approach allows us to quantify heterogeneity in surrogate strength with respect to patient characteristics while accommodating confounders through the use of flexible, off-the-shelf machine learning methods. In addition, we use our framework to identify individuals for whom the surrogate is a valid replacement of the primary outcome. We examine the performance of our methods via a simulation study and application to examine heterogeneity in the surrogacy of hemoglobin A1c as a surrogate for fasting plasma glucose.

Problem

Research questions and friction points this paper is trying to address.

Assessing surrogate marker heterogeneity in non-randomized real-world data

Quantifying surrogate strength variation across patient characteristics

Identifying valid surrogate-patient subgroups using flexible machine learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Meta-learners assess surrogate heterogeneity in real-world data

Flexible machine learning methods handle confounding variables

Identify valid surrogate individuals for primary outcomes

🔎 Similar Papers

No similar papers found.