🤖 AI Summary
Clinical prediction models often suffer performance degradation under case-mix shifts, yet the distinct mechanisms by which discrimination and calibration are affected—particularly across differing causal directions (e.g., prognostic: cause → outcome; diagnostic: outcome → cause)—remain poorly understood and uncharacterized.
Method: We propose the first causal-direction–driven framework for analyzing case-mix impact, integrating causal inference modeling, controlled simulation experiments, and empirical validation on real-world cardiovascular prediction models.
Contribution/Results: We identify an asymmetric robustness pattern: in prognostic tasks, calibration remains stable while discrimination degrades under case-mix shift; conversely, in diagnostic tasks, discrimination is robust but calibration deteriorates. This reveals that causal direction fundamentally governs stability patterns of model performance metrics across deployment settings. Our findings establish a novel paradigm—“causal direction determines robustness”—and provide interpretable, causally grounded criteria for cross-institutional model evaluation and optimization.
📝 Abstract
Prediction models need reliable predictive performance as they inform clinical decisions, aiding in diagnosis, prognosis, and treatment planning. The predictive performance of these models is typically assessed through discrimination and calibration. Changes in the distribution of the data impact model performance and there may be important changes between a model's current application and when and where its performance was last evaluated. In health-care, a typical change is a shift in case-mix. For example, for cardiovascular risk management, a general practitioner sees a different mix of patients than a specialist in a tertiary hospital. This work introduces a novel framework that differentiates the effects of case-mix shifts on discrimination and calibration based on the causal direction of the prediction task. When prediction is in the causal direction (often the case for prognosis predictions), calibration remains stable under case-mix shifts, while discrimination does not. Conversely, when predicting in the anti-causal direction (often with diagnosis predictions), discrimination remains stable, but calibration does not. A simulation study and empirical validation using cardiovascular disease prediction models demonstrate the implications of this framework. The causal case-mix framework provides insights for developing, evaluating and deploying prediction models across different clinical settings, emphasizing the importance of understanding the causal structure of the prediction task.