LungCURE: Benchmarking Multimodal Real-World Clinical Reasoning for Precision Lung Cancer Diagnosis and Treatment

📅 2026-04-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge that existing large language models struggle to perform guideline-adherent, multi-stage reasoning in lung cancer diagnosis and treatment. To bridge this gap, the authors formalize three core tasks of precision lung cancer therapy, introduce LungCURE—the first multimodal clinical benchmark based on 1,000 real-world multicenter cases—and propose LCAgent, a multi-agent framework that integrates multimodal large language models with clinical guideline constraints. By enabling collaborative reasoning among agents, LCAgent effectively mitigates cascading errors in complex clinical decision-making. Experimental results demonstrate substantial performance disparities among models in intricate medical reasoning scenarios, and show that LCAgent, as a plug-and-play module, significantly enhances the accuracy of end-to-end clinical decisions.
📝 Abstract
Lung cancer clinical decision support demands precise reasoning across complex, multi-stage oncological workflows. Existing multimodal large language models (MLLMs) fail to handle guideline-constrained staging and treatment reasoning. We formalize three oncological precision treatment (OPT) tasks for lung cancer, spanning TNM staging, treatment recommendation, and end-to-end clinical decision support. We introduce LungCURE, the first standardized multimodal benchmark built from 1,000 real-world, clinician-labeled cases across more than 10 hospitals. We further propose LCAgent, a multi-agent framework that ensures guideline-compliant lung cancer clinical decision-making by suppressing cascading reasoning errors across the clinical pathway. Experiments reveal large differences across various large language models (LLMs) in their capabilities for complex medical reasoning, when given precise treatment requirements. We further verify that LCAgent, as a simple yet effective plugin, enhances the reasoning performance of LLMs in real-world medical scenarios.
Problem

Research questions and friction points this paper is trying to address.

lung cancer
clinical reasoning
multimodal benchmark
precision treatment
guideline-constrained reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal benchmark
clinical reasoning
lung cancer
multi-agent framework
guideline-compliant decision-making
🔎 Similar Papers
No similar papers found.
F
Fangyu Hao
Beijing Univ. Posts & Telecommun.
Jiayu Yang
Jiayu Yang
The Australian National University
3D Computer Vision3D AIGC3D ReconstructionMulti-view StereoVR AR XR
Yifan Zhu
Yifan Zhu
Beijing University of Posts and Telecommunications
PEFT of LLMsGraph RAGGraph mining
Z
Zijun Yu
Beijing Univ. Posts & Telecommun.
Q
Qicen Wu
Beijing Univ. Posts & Telecommun.
W
Wang Yunlong
Beijing Univ. Posts & Telecommun.
J
Jiawei Li
Beijing Univ. Posts & Telecommun.
Y
Yulin Liu
Beijing Univ. Posts & Telecommun.
X
Xu Zeng
Beijing Univ. Posts & Telecommun.
G
Guanting Chen
Beijing Univ. Posts & Telecommun.
S
Shihao Li
Beijing Univ. Posts & Telecommun.
Zhonghong Ou
Zhonghong Ou
School of Computer Science, Beijing University of Posts and Telecommunications (BUPT), China
Computer VisionDeep LearningMachine LearningBig Data Analytics
Meina Song
Meina Song
Professor of Computer Science, Beijing University of Posts and Telecommunications
data science
Mengyang Sun
Mengyang Sun
Northwestern Polytechnical University
computer vision、 vision-language interaction
Haoran Luo
Haoran Luo
Nanyang Technological University
Knowledge GraphLarge Language ModelsGraph Neural Networks
Y
Yu Shi
Peking Union Med. Coll. Hosp.
Y
Yingyi Wang
Peking Union Med. Coll. Hosp.