Towards end-to-end LLM-based censoring-aware survival analysis

📅 2026-05-24

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This work addresses the challenge of applying large language models (LLMs) to survival analysis, where standard approaches fail to handle censored data effectively. The authors propose LLMSurvival, a novel framework that enables off-the-shelf LLMs to perform censoring-aware survival prediction without architectural modifications. By reformulating time-to-event modeling as a pairwise ranking task and aggregating comparison outcomes relative to anchor individuals from the training set, the method estimates risk for test samples directly from clinical tabular data. LLMSurvival demonstrates strong portability and supports local deployment. In evaluations on ICU mortality and fragility fracture prediction tasks, it achieves C-indices that outperform the Cox model by 3.1% and 0.5% on average, respectively, and surpasses three deep learning–based survival models as well as established clinical scoring systems such as SAPS-II and FRAX.

📝 Abstract

Objective: Survival analysis is central to medical prediction, yet large language models (LLMs) are rarely used as end-to-end survival models because censoring prevents straightforward supervised fine-tuning. Here we present LLMSurvival, a framework that enables censoring-aware survival analysis with unmodified LLMs operating directly on tabular clinical data. Materials and Methods: LLMSurvival reformulates time-to-event prediction as pairwise ranking among comparable subjects, and derives test-time risk by aggregating comparisons against anchor individuals from the training cohort. Results: Across two clinical tasks (ICU mortality prediction in MIMIC-IV and fragility fracture prediction in a NewYork-Presbyterian/Weill Cornell Medicine cohort), LLMSurvival improves overall concordance over Cox proportional hazards modeling by 3.1% for ICU mortality and 0.5% for fracture risk, 2.1% on average for ICU mortality and 2.8% for fracture risk over three established deep learning survival models. Discussion: The results show that survival modeling with censoring can be made compatible with LLM fine-tuning through comparison-based reformulation. The framework demonstrates high portability and superior performance over expert curated scores like SAPS-II and FRAX scores across diverse clinical context. Furthermore, the framework supports local deployment, as compact, publicly available base models provide sufficient performance. Conclusion: The LLMSurvival framework serves as a proof of concept for an integrated, censoring-conscious approach to survival analysis via LLMs.

Problem

Research questions and friction points this paper is trying to address.

survival analysis

censoring

large language models

time-to-event prediction

clinical prediction

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based survival analysis

censoring-aware modeling

pairwise ranking