π€ AI Summary
Existing survival analysis models are typically calibrated only at the population level, leading to miscalibration within minority subpopulations and increased risk of clinical misjudgment. To address this, we propose GRADUATEβa novel framework that formulates multi-subpopulation calibration as a constrained optimization problem, jointly optimizing predictive discrimination and cross-group calibration during training. We provide theoretical guarantees on the near-optimality and feasibility of its solution. By introducing a multi-calibration loss, GRADUATE enforces predicted probabilities across subpopulations to converge toward their respective true event rates, while preserving high predictive accuracy. Evaluated on multiple real-world clinical datasets, GRADUATE significantly outperforms state-of-the-art methods, achieving consistently high calibration accuracy across diverse subpopulations. This work establishes a new paradigm for fair and reliable individualized prognostic assessment.
π Abstract
Survival analysis is an important problem in healthcare because it models the relationship between an individual's covariates and the onset time of an event of interest (e.g., death). It is important for survival models to be well-calibrated (i.e., for their predicted probabilities to be close to ground-truth probabilities) because badly calibrated systems can result in erroneous clinical decisions. Existing survival models are typically calibrated at the population level only, and thus run the risk of being poorly calibrated for one or more minority subpopulations. We propose a model called GRADUATE that achieves multicalibration by ensuring that all subpopulations are well-calibrated too. GRADUATE frames multicalibration as a constrained optimization problem, and optimizes both calibration and discrimination in-training to achieve a good balance between them. We mathematically prove that the optimization method used yields a solution that is both near-optimal and feasible with high probability. Empirical comparisons against state-of-the-art baselines on real-world clinical datasets demonstrate GRADUATE's efficacy. In a detailed analysis, we elucidate the shortcomings of the baselines vis-a-vis GRADUATE's strengths.