Towards end-to-end LLM-based censoring-aware survival analysis

๐Ÿ“… 2026-05-24
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the challenge of applying large language models (LLMs) to survival analysis, where standard approaches fail to handle censored data effectively. The authors propose LLMSurvival, a novel framework that enables off-the-shelf LLMs to perform censoring-aware survival prediction without architectural modifications. By reformulating time-to-event modeling as a pairwise ranking task and aggregating comparison outcomes relative to anchor individuals from the training set, the method estimates risk for test samples directly from clinical tabular data. LLMSurvival demonstrates strong portability and supports local deployment. In evaluations on ICU mortality and fragility fracture prediction tasks, it achieves C-indices that outperform the Cox model by 3.1% and 0.5% on average, respectively, and surpasses three deep learningโ€“based survival models as well as established clinical scoring systems such as SAPS-II and FRAX.
๐Ÿ“ Abstract
Objective: Survival analysis is central to medical prediction, yet large language models (LLMs) are rarely used as end-to-end survival models because censoring prevents straightforward supervised fine-tuning. Here we present LLMSurvival, a framework that enables censoring-aware survival analysis with unmodified LLMs operating directly on tabular clinical data. Materials and Methods: LLMSurvival reformulates time-to-event prediction as pairwise ranking among comparable subjects, and derives test-time risk by aggregating comparisons against anchor individuals from the training cohort. Results: Across two clinical tasks (ICU mortality prediction in MIMIC-IV and fragility fracture prediction in a NewYork-Presbyterian/Weill Cornell Medicine cohort), LLMSurvival improves overall concordance over Cox proportional hazards modeling by 3.1% for ICU mortality and 0.5% for fracture risk, 2.1% on average for ICU mortality and 2.8% for fracture risk over three established deep learning survival models. Discussion: The results show that survival modeling with censoring can be made compatible with LLM fine-tuning through comparison-based reformulation. The framework demonstrates high portability and superior performance over expert curated scores like SAPS-II and FRAX scores across diverse clinical context. Furthermore, the framework supports local deployment, as compact, publicly available base models provide sufficient performance. Conclusion: The LLMSurvival framework serves as a proof of concept for an integrated, censoring-conscious approach to survival analysis via LLMs.
Problem

Research questions and friction points this paper is trying to address.

survival analysis
censoring
large language models
time-to-event prediction
clinical prediction
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based survival analysis
censoring-aware modeling
pairwise ranking
end-to-end survival prediction
clinical tabular data
๐Ÿ”Ž Similar Papers
No similar papers found.