TumorChain: Interleaved Multimodal Chain-of-Thought Reasoning for Traceable Clinical Tumor Analysis

πŸ“… 2026-03-06
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses diagnostic errors and hallucinations in clinical oncology analysis, which often arise from the absence of traceable, multimodal reasoning mechanisms. To this end, we propose TumorChain, a novel framework that introduces TumorCoTβ€”the first large-scale multimodal chain-of-thought reasoning benchmark for oncology. By integrating 3D medical imaging, clinical text, and organ-level vision-language alignment through interleaved causal reasoning, TumorChain enables interpretable and traceable analysis from radiological findings to pathological predictions. The method combines 3D image encoding, clinical semantic understanding, and self-optimized multi-turn reasoning, significantly outperforming strong baselines in lesion detection, clinical impression generation, and pathological classification. Moreover, it demonstrates superior generalization performance on the DeepTumorVQA benchmark.

Technology Category

Application Category

πŸ“ Abstract
Accurate tumor analysis is central to clinical radiology and precision oncology, where early detection, reliable lesion characterization, and pathology-level risk assessment guide diagnosis and treatment planning. Chain-of-Thought (CoT) reasoning is particularly important in this setting because it enables step-by-step interpretation from imaging findings to clinical impressions and pathology conclusions, improving traceability and reducing diagnostic errors. Here, we target the clinical tumor analysis task and build a large-scale benchmark that operationalizes a multimodal reasoning pipeline, spanning findings, impressions, and pathology predictions. We curate TumorCoT, a large-scale dataset of 1.5M CoT-labeled VQA instructions paired with 3D CT scans, with step-aligned rationales and cross-modal alignments along the trajectory from findings to impression to pathology, enabling evaluation of both answer accuracy and reasoning consistency. We further propose TumorChain, a multimodal interleaved reasoning framework that tightly couples 3D imaging encoders, clinical text understanding, and organ-level vision-language alignment. Through cross-modal alignment and iterative interleaved causal reasoning, TumorChain grounds visual evidence, aggregates conclusions, and issues pathology predictions after multiple rounds of self-refinement, improving traceability and reducing hallucination risk. Experiments show consistent improvements over strong baselines in lesion detection, impression generation, and pathology classification, and demonstrate strong generalization on the DeepTumorVQA benchmark. These results highlight the potential of multimodal reasoning for reliable and interpretable tumor analysis in clinical practice. Detailed information about our project can be found on our project homepage at https://github.com/ZJU4HealthCare/TumorChain.
Problem

Research questions and friction points this paper is trying to address.

tumor analysis
multimodal reasoning
chain-of-thought
clinical traceability
pathology prediction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Chain-of-Thought Reasoning
Multimodal Interleaved Reasoning
3D Medical Imaging
Clinical Tumor Analysis
Cross-modal Alignment
πŸ”Ž Similar Papers
No similar papers found.
Sijing Li
Sijing Li
zhejiang university
MLLM
Zhongwei Qiu
Zhongwei Qiu
DAMO Academy, Alibaba Group; Zhejiang University
Computer VisionMultimodal LearningMLLMAI for Healthcare
Jiang Liu
Jiang Liu
Zhejiang University
Multimodal intelligence
W
Wenqiao Zhang
Zhejiang University
Tianwei Lin
Tianwei Lin
Zhejiang University
MLLMs
Y
Yihan Xie
Zhejiang University
J
Jianxiang An
Zhejiang University
Boxiang Yun
Boxiang Yun
East China Normal University
Medical Image Processing
Chenglin Yang
Chenglin Yang
Johns Hopkins University
Computer vision
J
Jun Xiao
Zhejiang University
Guangyu Guo
Guangyu Guo
Alibaba DAMO Academy
Computer VisionMedical Image Analysis
Jiawen Yao
Jiawen Yao
Alibaba DAMO Academy
Medical Image AnalysisSignal ProcessingDeep Learning
Wei Liu
Wei Liu
Alibaba Group
machine learningmedical image analysis
Yuan Gao
Yuan Gao
Staff Engineer, Alibaba Group, Damo Academy
Machine LearningComputer VisionMedical ImagingFetal Ultrasound Analysis
Ke Yan
Ke Yan
Staff Algorithm Engineer, Alibaba DAMO Academy
Deep learningcomputer visionmedical imagemachine learning
Weiwei Cao
Weiwei Cao
Alibaba DAMO Academy, Zhejiang University
Medical Image AnalysisVision and Language
Z
Zhilin Zheng
DAMO Academy, Alibaba Group
Tony C. W. Mok
Tony C. W. Mok
Alibaba DAMO Academy
Medical image registrationMedical image analysisComputer VisionDeep learning
K
Kai Cao
Shanghai Institution of Pancreatic Disease
Yu Shi
Yu Shi
Shengjing hospital
Magnetic resonance elastography
J
Jiuyu Zhang
Shengjing Hospital of China Medical University
J
Jian Zhou
Sun Yat-sen University Cancer Center
B
Beng Chin Ooi
Zhejiang University
Y
Yingda Xia
DAMO Academy, Alibaba Group
Ling Zhang
Ling Zhang
Alibaba DAMO Academy USA
Medical Image AnalysisMedical Image ComputingMachine LearningImage Processing