"Do I Trust the AI?"Towards Trustworthy AI-Assisted Diagnosis: Understanding User Perception in LLM-Supported Reasoning

📅 2026-01-27

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

This study addresses the misalignment between clinicians’ trust in large language models (LLMs) for clinical diagnosis and actual model performance, often distorted by perceptual biases that impede effective AI integration into clinical workflows. For the first time, it systematically investigates physicians’ subjective perceptions of LLM-generated clinical reasoning by presenting them with diagnostic cases accompanied by LLM explanations, which were then evaluated for credibility by 37 clinicians. Quantitative analysis comparing these human judgments against standard benchmark metrics reveals a significant discrepancy: clinicians prioritize reasoning dimensions such as logical coherence and evidential support, which are poorly captured by conventional evaluation frameworks. These findings underscore the limitations of current assessment methodologies and provide empirical grounding for a new paradigm in developing trustworthy, clinically aligned AI-assisted diagnostic systems centered on human trust.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have shown considerable potential in supporting medical diagnosis. However, their effective integration into clinical workflows is hindered by physicians'difficulties in perceiving and trusting LLM capabilities, which often results in miscalibrated trust. Existing model evaluations primarily emphasize standardized benchmarks and predefined tasks, offering limited insights into clinical reasoning practices. Moreover, research on human-AI collaboration has rarely examined physicians'perceptions of LLMs'clinical reasoning capability. In this work, we investigate how physicians perceive LLMs'capabilities in the clinical reasoning process. We designed clinical cases, collected the corresponding analyses, and obtained evaluations from physicians (N=37) to quantitatively represent their perceived LLM diagnostic capabilities. By comparing the perceived evaluations with benchmark performance, our study highlights the aspects of clinical reasoning that physicians value and underscores the limitations of benchmark-based evaluation. We further discuss the implications of opportunities for enhancing trustworthy collaboration between physicians and LLMs in LLM-supported clinical reasoning.

Problem

Research questions and friction points this paper is trying to address.

trustworthy AI

clinical reasoning

user perception

LLM-assisted diagnosis

human-AI collaboration

Innovation

Methods, ideas, or system contributions that make the work stand out.

trustworthy AI

clinical reasoning

human-AI collaboration