Human-AI Co-reasoning for Clinical Diagnosis with Evidence-Integrated Language Agent

📅 2026-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the diagnostic accuracy bottleneck in clinical practice caused by limited physician experience and disease rarity by proposing PULSE, a medical reasoning agent that integrates a domain-finetuned large language model with scientific literature retrieval to support diagnostic decision-making for complex endocrine cases. Two human-AI collaboration mechanisms—serial and parallel—are designed and evaluated. On a real-world benchmark of 82 endocrine cases, PULSE achieves expert-level performance in both Top@1 and Top@4 accuracy, significantly outperforming residents and junior specialists, with robust results on rare diseases. Its diagnostic output length adaptively scales with case complexity. This work provides the first validation on a real clinical benchmark that AI can attain expert-level diagnostic capability and demonstrates that human-AI collaboration can correct misdiagnoses, though it also highlights the need to mitigate automation bias.

Technology Category

Application Category

📝 Abstract
We present PULSE, a medical reasoning agent that combines a domain-tuned large language model with scientific literature retrieval to support diagnostic decision-making in complex real-world cases. To evaluate its capabilities, we curated a benchmark of 82 authentic endocrinology case reports encompassing a broad spectrum of disease types and incidence levels. In controlled experiments, we compared PULSE's performance against physicians with varying levels of expertise-from residents to senior specialists-and examined how AI assistance influenced human diagnostic reasoning. PULSE attained expert-competitive accuracy, outperforming residents and junior specialists while matching senior specialist performance at both Top@1 and Top@4 thresholds. Unlike physicians, whose accuracy declined with disease rarity, PULSE maintained stable performance across incidence tiers. The agent also exhibited adaptive reasoning, increasing output length with case difficulty in a manner analogous to the longer deliberation observed among expert clinicians. When used collaboratively, PULSE enabled physicians to correct initial errors and broaden diagnostic hypotheses, but also introduced risks of automation bias. The study explores both serial and concurrent collaboration workflows, revealing that PULSE offers robust support across common and rare presentations. These findings underscore both the promise and the limitations of language model-based agents in clinical diagnosis, and offer a framework for evaluating their role in real-world decision-making.
Problem

Research questions and friction points this paper is trying to address.

Human-AI co-reasoning
clinical diagnosis
diagnostic decision-making
automation bias
rare diseases
Innovation

Methods, ideas, or system contributions that make the work stand out.

human-AI co-reasoning
evidence-integrated language agent
clinical diagnosis
adaptive reasoning
automation bias
🔎 Similar Papers
No similar papers found.
Zhongzhen Huang
Zhongzhen Huang
Shanghai Jiao Tong University
Medical Image AnalysisVision and Language
Yan Ling
Yan Ling
Oakland University
Strategic Managementtop management teamCEO
H
Hong Chen
Department of Endocrinology, Zhongshan Hospital, Fudan University, Shanghai, China
Y
Ye Feng
Department of Endocrinology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
Li Wu
Li Wu
Qinghai University
spatiotemporal prediction,uncertainty analysis
L
Linjie Mu
Shanghai Jiao Tong University, Shanghai, China
Shaoting Zhang
Shaoting Zhang
Shanghai AI Lab; SenseTime Research
Medical Image AnalysisComputer VisionFoundation Models
Xiaofan Zhang
Xiaofan Zhang
Associate Professor at Shanghai Jiao Tong University
Medical Image Analysis
Kun Qian
Kun Qian
HKUST/NEU/Sichuan University
spintronicsmagnetic devices
X
Xiaomu Li
Department of Endocrinology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China