"Do I Trust the AI?"Towards Trustworthy AI-Assisted Diagnosis: Understanding User Perception in LLM-Supported Reasoning

📅 2026-01-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the misalignment between clinicians’ trust in large language models (LLMs) for clinical diagnosis and actual model performance, often distorted by perceptual biases that impede effective AI integration into clinical workflows. For the first time, it systematically investigates physicians’ subjective perceptions of LLM-generated clinical reasoning by presenting them with diagnostic cases accompanied by LLM explanations, which were then evaluated for credibility by 37 clinicians. Quantitative analysis comparing these human judgments against standard benchmark metrics reveals a significant discrepancy: clinicians prioritize reasoning dimensions such as logical coherence and evidential support, which are poorly captured by conventional evaluation frameworks. These findings underscore the limitations of current assessment methodologies and provide empirical grounding for a new paradigm in developing trustworthy, clinically aligned AI-assisted diagnostic systems centered on human trust.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have shown considerable potential in supporting medical diagnosis. However, their effective integration into clinical workflows is hindered by physicians'difficulties in perceiving and trusting LLM capabilities, which often results in miscalibrated trust. Existing model evaluations primarily emphasize standardized benchmarks and predefined tasks, offering limited insights into clinical reasoning practices. Moreover, research on human-AI collaboration has rarely examined physicians'perceptions of LLMs'clinical reasoning capability. In this work, we investigate how physicians perceive LLMs'capabilities in the clinical reasoning process. We designed clinical cases, collected the corresponding analyses, and obtained evaluations from physicians (N=37) to quantitatively represent their perceived LLM diagnostic capabilities. By comparing the perceived evaluations with benchmark performance, our study highlights the aspects of clinical reasoning that physicians value and underscores the limitations of benchmark-based evaluation. We further discuss the implications of opportunities for enhancing trustworthy collaboration between physicians and LLMs in LLM-supported clinical reasoning.
Problem

Research questions and friction points this paper is trying to address.

trustworthy AI
clinical reasoning
user perception
LLM-assisted diagnosis
human-AI collaboration
Innovation

Methods, ideas, or system contributions that make the work stand out.

trustworthy AI
clinical reasoning
human-AI collaboration
large language models
user perception
Y
Yuansong Xu
School of Information Science and Technology, ShanghaiTech University
Y
Yichao Zhu
School of Information Science and Technology, ShanghaiTech University
H
Haokai Wang
School of Information Science and Technology, ShanghaiTech University
Yuchen Wu
Yuchen Wu
ShanghaiTech University
Human-Computer InteractionData Visualization
O
Ouyang Yang
School of Information Science and Technology, ShanghaiTech University
H
Hanlu Li
Shanghai Clinical Research and Trial Center, ShanghaiTech University
W
Wenzhe Zhou
Shanghai Clinical Research and Trial Center, ShanghaiTech University
X
Xinyu Liu
Shanghai Clinical Research and Trial Center, ShanghaiTech University
C
Chang Jiang
Shanghai Clinical Research and Trial Center, ShanghaiTech University
Quan Li
Quan Li
Tenure-Track Assistant Professor, ShanghaiTech University
Explainable Machine LearningSocial MediaData VisualizationVisual Analyticsand Human-Computer