Stop Chasing the C-index: This Is How We Should Evaluate Our Survival Models

📅 2025-06-02

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Current survival analysis evaluation overrelies on the C-index, which solely measures discriminative ability while neglecting critical dimensions such as temporal prediction accuracy and probability calibration—leading to biased model validation. To address this, we propose the “Hypothesis Ladder” dual-helix framework, asserting that model assumptions and evaluation metrics must be strictly aligned at the same theoretical level. Based on this principle, we establish a multidimensional evaluation criterion encompassing discrimination, calibration, and robustness. Methodologically, our approach integrates statistical evaluation theory, calibration diagnostics, censoring mechanism modeling, and sensitivity analysis. Our investigation uncovers systematic flaws in prevailing evaluation practices and advocates a paradigm shift—from single-metric assessment toward a hypothesis-consistent, multidimensionally coordinated validation framework. This work provides both a theoretical foundation and practical guidelines for rigorous survival model evaluation.

Technology Category

Application Category

📝 Abstract

We argue that many survival analysis and time-to-event models are incorrectly evaluated. First, we survey many examples of evaluation approaches in the literature and find that most rely on concordance (C-index). However, the C-index only measures a model's discriminative ability and does not assess other important aspects, such as the accuracy of the time-to-event predictions or the calibration of the model's probabilistic estimates. Next, we present a set of key desiderata for choosing the right evaluation metric and discuss their pros and cons. These are tailored to the challenges in survival analysis, such as sensitivity to miscalibration and various censoring assumptions. We hypothesize that the current development of survival metrics conforms to a double-helix ladder, and that model validity and metric validity must stand on the same rung of the assumption ladder. Finally, we discuss the appropriate methods for evaluating a survival model in practice and summarize various viewpoints opposing our analysis.

Problem

Research questions and friction points this paper is trying to address.

Current survival models rely too heavily on C-index for evaluation

C-index fails to assess prediction accuracy and model calibration

Need tailored metrics for survival analysis challenges and assumptions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposing alternative survival model evaluation metrics

Highlighting limitations of C-index in assessments

Introducing double-helix ladder metric framework

🔎 Similar Papers

No similar papers found.