๐ค AI Summary
This work addresses the challenge of applying large language models (LLMs) to survival analysis, where standard approaches fail to handle censored data effectively. The authors propose LLMSurvival, a novel framework that enables off-the-shelf LLMs to perform censoring-aware survival prediction without architectural modifications. By reformulating time-to-event modeling as a pairwise ranking task and aggregating comparison outcomes relative to anchor individuals from the training set, the method estimates risk for test samples directly from clinical tabular data. LLMSurvival demonstrates strong portability and supports local deployment. In evaluations on ICU mortality and fragility fracture prediction tasks, it achieves C-indices that outperform the Cox model by 3.1% and 0.5% on average, respectively, and surpasses three deep learningโbased survival models as well as established clinical scoring systems such as SAPS-II and FRAX.
๐ Abstract
Objective: Survival analysis is central to medical prediction, yet large language models (LLMs) are rarely used as end-to-end survival models because censoring prevents straightforward supervised fine-tuning. Here we present LLMSurvival, a framework that enables censoring-aware survival analysis with unmodified LLMs operating directly on tabular clinical data.
Materials and Methods: LLMSurvival reformulates time-to-event prediction as pairwise ranking among comparable subjects, and derives test-time risk by aggregating comparisons against anchor individuals from the training cohort.
Results: Across two clinical tasks (ICU mortality prediction in MIMIC-IV and fragility fracture prediction in a NewYork-Presbyterian/Weill Cornell Medicine cohort), LLMSurvival improves overall concordance over Cox proportional hazards modeling by 3.1% for ICU mortality and 0.5% for fracture risk, 2.1% on average for ICU mortality and 2.8% for fracture risk over three established deep learning survival models.
Discussion: The results show that survival modeling with censoring can be made compatible with LLM fine-tuning through comparison-based reformulation. The framework demonstrates high portability and superior performance over expert curated scores like SAPS-II and FRAX scores across diverse clinical context. Furthermore, the framework supports local deployment, as compact, publicly available base models provide sufficient performance.
Conclusion: The LLMSurvival framework serves as a proof of concept for an integrated, censoring-conscious approach to survival analysis via LLMs.