Cognitive bias in LLM reasoning compromises interpretation of clinical oncology notes

📅 2025-11-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit latent cognitive biases when interpreting clinical oncology notes, frequently generating unsafe outputs characterized by “correct conclusions with flawed reasoning”—a failure mode undetectable by conventional accuracy metrics. Method: We introduce the first three-tier taxonomy explicitly mapping computational errors to underlying cognitive biases. Leveraging GPT-4 chain-of-thought responses and real-world prostate cancer multidisciplinary consultation records (n=822), we construct a reasoning trajectory dataset. Integrating expert annotation with automated evaluation, we systematically identify and categorize reasoning defects. Contribution/Results: We find that 23% of LLM interpretations contain reasoning errors, predominantly confirmation bias and anchoring bias—both significantly associated with guideline nonadherence and potentially harmful recommendations. Our framework enables the first cognitive-bias–driven assessment of reasoning fidelity in oncology LLMs, establishing a generalizable methodology for safety validation and trustworthiness enhancement of clinical AI systems.

Technology Category

Application Category

📝 Abstract
Despite high performance on clinical benchmarks, large language models may reach correct conclusions through faulty reasoning, a failure mode with safety implications for oncology decision support that is not captured by accuracy-based evaluation. In this two-cohort retrospective study, we developed a hierarchical taxonomy of reasoning errors from GPT-4 chain-of-thought responses to real oncology notes and tested its clinical relevance. Using breast and pancreatic cancer notes from the CORAL dataset, we annotated 600 reasoning traces to define a three-tier taxonomy mapping computational failures to cognitive bias frameworks. We validated the taxonomy on 822 responses from prostate cancer consult notes spanning localized through metastatic disease, simulating extraction, analysis, and clinical recommendation tasks. Reasoning errors occurred in 23 percent of interpretations and dominated overall errors, with confirmation bias and anchoring bias most common. Reasoning failures were associated with guideline-discordant and potentially harmful recommendations, particularly in advanced disease management. Automated evaluators using state-of-the-art language models detected error presence but could not reliably classify subtypes. These findings show that large language models may provide fluent but clinically unsafe recommendations when reasoning is flawed. The taxonomy provides a generalizable framework for evaluating and improving reasoning fidelity before clinical deployment.
Problem

Research questions and friction points this paper is trying to address.

Detect reasoning errors in LLMs interpreting oncology notes
Associate cognitive biases with unsafe clinical recommendations
Develop taxonomy for evaluating reasoning fidelity pre-deployment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed hierarchical taxonomy of reasoning errors
Validated taxonomy across multiple cancer types
Automated evaluators detected errors but not subtypes
🔎 Similar Papers
No similar papers found.
M
Matthew W. Kenaston
Mayo Clinic College of Medicine and Science, Phoenix, AZ
U
Umair Ayub
Mayo Clinic College of Medicine and Science, Phoenix, AZ
M
Mihir Parmar
School of Computing and AI, Arizona State University, Tempe, AZ
M
Muhammad Umair Anjum
Mayo Clinic College of Medicine and Science, Phoenix, AZ
Syed Arsalan Ahmed Naqvi
Syed Arsalan Ahmed Naqvi
Mayo Clinic College of Medicine and Science, Phoenix, AZ
P
Priya Kumar
Mayo Clinic College of Medicine and Science, Phoenix, AZ
S
Samarth Rawal
Mayo Clinic College of Medicine and Science, Phoenix, AZ
A
Aadel A. Chaudhuri
Department of Radiation Oncology, Mayo Clinic, Rochester, MN
Y
Yousef Zakharia
Mayo Clinic Comprehensive Cancer Center, Phoenix, AZ
E
Elizabeth I. Heath
Department of Oncology, Mayo Clinic, Rochester, MN
T
Tanios S. Bekaii-Saab
Mayo Clinic Comprehensive Cancer Center, Phoenix, AZ
Cui Tao
Cui Tao
Department of AI and Informatics, Mayo Clinic
Knowledge GraphInformation ExtractionOntologyML/DL based EHR data analysisVaccine
E
Eliezer M. Van Allen
Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA
Ben Zhou
Ben Zhou
Assistant Professor, Arizona State University
YooJung Choi
YooJung Choi
Assistant Professor, Arizona State University
Artificial IntelligenceMachine LearningProbabilistic Circuits
Chitta Baral
Chitta Baral
Professor of Computer Science, Arizona State University
Knowledge RepresentationNLPVisionRoboticsIntegrated Systems
Irbaz Bin Riaz
Irbaz Bin Riaz
Mayo Clinic