Trustworthy Agents for Electronic Health Records through Confidence Estimation

📅 2025-08-26

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

Large language models (LLMs) frequently generate hallucinations in clinical question answering over electronic health records (EHRs), undermining decision reliability. To address this, we propose TrustEHRAgent—a trustworthy agent architecture grounded in step-level confidence estimation. Its core innovation is a dynamic confidence-aware mechanism that enables interpretable calibration of the reasoning process, coupled with a novel evaluation metric, HCAcc@k%, which quantifies the trade-off between accuracy and reliability. Evaluated on MIMIC-III and eICU, TrustEHRAgent achieves absolute improvements of 44.23 and 25.34 percentage points in HCAcc@70%, respectively, significantly outperforming existing baselines. This work advances medical AI from a “high-accuracy” to a “high-trustworthiness” paradigm, delivering a verifiable and deployable trust framework for LLM-based clinical decision support.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) show promise for extracting information from Electronic Health Records (EHR) and supporting clinical decisions. However, deployment in clinical settings faces challenges due to hallucination risks. We propose Hallucination Controlled Accuracy at k% (HCAcc@k%), a novel metric quantifying the accuracy-reliability trade-off at varying confidence thresholds. We introduce TrustEHRAgent, a confidence-aware agent incorporating stepwise confidence estimation for clinical question answering. Experiments on MIMIC-III and eICU datasets show TrustEHRAgent outperforms baselines under strict reliability constraints, achieving improvements of 44.23%p and 25.34%p at HCAcc@70% while baseline methods fail at these thresholds. These results highlight limitations of traditional accuracy metrics in evaluating healthcare AI agents. Our work contributes to developing trustworthy clinical agents that deliver accurate information or transparently express uncertainty when confidence is low.

Problem

Research questions and friction points this paper is trying to address.

Addressing hallucination risks in LLMs for EHR analysis

Proposing confidence-aware metrics for clinical decision reliability

Developing trustworthy agents for accurate healthcare information extraction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Stepwise confidence estimation for clinical QA

Novel HCAcc@k% metric for reliability-accuracy tradeoff

Confidence-aware agent outperforms baselines under strict constraints

🔎 Similar Papers

To Trust or Not to Trust: Towards a novel approach to measure trust for XAI systems