Medical Image Understanding Improves Survival Prediction via Visual Instruction Tuning

📅 2026-04-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

176K/year
🤖 AI Summary
This work proposes the first vision-instruction-tuned framework for 3D CT image–language understanding to address the limitations of traditional survival prediction methods that rely on expert interpretation and often suffer from loss of visual information. The model is pretrained on large-scale paired CT images and radiology reports to learn clinically relevant multimodal representations, then fine-tuned with instruction-based learning and integrated with a survival analysis head to enable end-to-end image understanding and natural language generation. The approach substantially outperforms existing baselines, with particularly notable gains under data-scarce clinical settings, and generates interpretable textual summaries that demonstrate prognostic value.

Technology Category

Application Category

📝 Abstract
Accurate prognostication and risk estimation are essential for guiding clinical decision-making and optimizing patient management. While radiologist-assessed features from CT scans provide valuable indicators of disease severity and outcomes, interpreting such images requires expert knowledge, and translating rich visual information into textual summaries inevitably leads to information loss. In this work, we propose a vision-language framework for 3D CT image understanding that leverages large-scale open-sourced CT images paired with radiology reports through visual instruction tuning. This pre-training enables the model to learn clinically meaningful visual-textual representations, which can then be adapted to downstream survival prediction tasks. By incorporating a survival prediction head on top of the pre-trained model, our approach improves survival prediction from CT images and clinical data while generating clinically meaningful language responses to predefined questions. Experimental results demonstrate that our method outperforms baseline methods in survival prediction, particularly, when clinical data alone is less predictive. The code will be released upon acceptance.
Problem

Research questions and friction points this paper is trying to address.

survival prediction
medical image understanding
CT imaging
visual-textual representation
prognostication
Innovation

Methods, ideas, or system contributions that make the work stand out.

visual instruction tuning
vision-language model
3D CT image understanding
survival prediction
multimodal pre-training
X
Xixi Liu
Chalmers University of Technology, Gothenburg, Sweden
J
Jorge Lazo
Chalmers University of Technology, Gothenburg, Sweden
A
Andreas Hallqvist
Chalmers University of Technology, Gothenburg, Sweden
Mikael Johansson
Mikael Johansson
PhD, Division of Design and Human Factors, Chalmers University of Technology
Mental ModelsAutomated VehiclesHuman-Machine Interaction
Å
Åse Johnsson
Chalmers University of Technology, Gothenburg, Sweden
J
Jonas S Andersson
Chalmers University of Technology, Gothenburg, Sweden
E
Ella Äng Eklund
Chalmers University of Technology, Gothenburg, Sweden
P
Patrik Sund
Chalmers University of Technology, Gothenburg, Sweden
N
Nasser Hosseini
Chalmers University of Technology, Gothenburg, Sweden
J
Jennifer Alvén
Chalmers University of Technology, Gothenburg, Sweden
I
Ida Häggström
Chalmers University of Technology, Gothenburg, Sweden