Towards Robust and Fair Next Visit Diagnosis Prediction under Noisy Clinical Notes with Large Language Models

📅 2025-11-23

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

This study addresses distribution shift in clinical text caused by pervasive human errors and automated noise, which undermines the reliability and fairness of large language models (LLMs) in “next-visit diagnosis prediction.” We propose two core innovations: (1) a clinical-knowledge-guided label space compression mechanism to mitigate prediction uncertainty induced by noisy labels; and (2) a hierarchical chain-of-thought (CoT) reasoning strategy to enhance model robustness against degraded inputs and improve performance parity across demographic subgroups. Evaluated on multi-source real-world clinical text datasets, our approach significantly improves overall accuracy under noise (+5.2%) and substantially enhances subgroup performance stability—reducing inter-group standard deviation by 38%. The method offers an interpretable, deployable paradigm for fair and reliable LLM-driven clinical decision support.

Technology Category

Application Category

📝 Abstract

A decade of rapid advances in artificial intelligence (AI) has opened new opportunities for clinical decision support systems (CDSS), with large language models (LLMs) demonstrating strong reasoning abilities on timely medical tasks. However, clinical texts are often degraded by human errors or failures in automated pipelines, raising concerns about the reliability and fairness of AI-assisted decision-making. Yet the impact of such degradations remains under-investigated, particularly regarding how noise-induced shifts can heighten predictive uncertainty and unevenly affect demographic subgroups. We present a systematic study of state-of-the-art LLMs under diverse text corruption scenarios, focusing on robustness and equity in next-visit diagnosis prediction. To address the challenge posed by the large diagnostic label space, we introduce a clinically grounded label-reduction scheme and a hierarchical chain-of-thought (CoT) strategy that emulates clinicians' reasoning. Our approach improves robustness and reduces subgroup instability under degraded inputs, advancing the reliable use of LLMs in CDSS. We release code at https://github.com/heejkoo9/NECHOv3.

Problem

Research questions and friction points this paper is trying to address.

Addressing reliability and fairness concerns in AI-assisted clinical decision-making

Investigating noise-induced shifts in predictive uncertainty across demographic subgroups

Improving robustness of next-visit diagnosis prediction with degraded clinical texts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical chain-of-thought strategy mimics clinical reasoning

Clinically grounded label-reduction scheme handles diagnostic complexity

System improves robustness and fairness under text corruption

🔎 Similar Papers

Large Language Models for Disease Diagnosis: A Scoping Review