Rethinking external validation for the target population: Capturing patient-level similarity with a generative model

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Traditional external validation struggles to disentangle the effects of model deficiencies from population distribution shifts—often termed case-mix effects—on performance degradation. To address this, this work proposes a generative-model-based framework for patient-level similarity assessment, leveraging autoencoders to quantify how closely external individuals resemble those in the development cohort. By evaluating model performance across subgroups stratified by similarity, the approach decouples true model generalizability from dataset shift. Notably, it operates without requiring access to the original development data, thereby overcoming key limitations of conventional linear validation paradigms. Experiments on both synthetic and real-world Dutch cardiac registry data demonstrate that the framework uncovers clinically relevant performance disparities masked by standard validation practices, substantially improving the accuracy of model applicability assessments.

📝 Abstract

Background: External validation is essential for assessing the transportability of predictive models. However, its interpretation is often confounded by differences between external and development populations. This study introduces a framework to distinguish model deficiencies from case-mix effects. Method: We propose a framework that quantifies each external patient's similarity to the development data and measures performance in subgroups with varying levels of alignment to the development distribution. We use generative models, specifically autoencoders, to estimate similarity, offering a more flexible alternative to traditional linear approaches and enabling validation without sharing the original development data. The utility of autoencoder-based similarity measure is demonstrated using synthetic data, and the framework's application is illustrated using data from the Netherlands Heart Registration (NHR) to predict mortality after transcatheter aortic valve implantation. Results: Our framework revealed substantial variation in model performance across similarity-defined subgroups, differences that remain hidden under conventional external validation yet can meaningfully alter conclusions. In several settings, conventional external validation suggested poor overall performance. However, after accounting for differences in patient characteristics, for some sub-groups, the model performance was consistent with internal validation results. Conversely, apparently acceptable overall performance could mask clinically relevant performance deficits in specific subgroups. Conclusion: The proposed framework enhances the interpretability of external validation by linking model performance to population alignment with the development data. This provides a more principled basis for deciding whether a model is transportable and to which patients it can be safely applied.

Problem

Research questions and friction points this paper is trying to address.

external validation

transportability

case-mix effects

predictive models

population alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

generative model

external validation

autoencoder