Trustworthy Agents for Electronic Health Records through Confidence Estimation

📅 2025-08-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) frequently generate hallucinations in clinical question answering over electronic health records (EHRs), undermining decision reliability. To address this, we propose TrustEHRAgent—a trustworthy agent architecture grounded in step-level confidence estimation. Its core innovation is a dynamic confidence-aware mechanism that enables interpretable calibration of the reasoning process, coupled with a novel evaluation metric, HCAcc@k%, which quantifies the trade-off between accuracy and reliability. Evaluated on MIMIC-III and eICU, TrustEHRAgent achieves absolute improvements of 44.23 and 25.34 percentage points in HCAcc@70%, respectively, significantly outperforming existing baselines. This work advances medical AI from a “high-accuracy” to a “high-trustworthiness” paradigm, delivering a verifiable and deployable trust framework for LLM-based clinical decision support.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) show promise for extracting information from Electronic Health Records (EHR) and supporting clinical decisions. However, deployment in clinical settings faces challenges due to hallucination risks. We propose Hallucination Controlled Accuracy at k% (HCAcc@k%), a novel metric quantifying the accuracy-reliability trade-off at varying confidence thresholds. We introduce TrustEHRAgent, a confidence-aware agent incorporating stepwise confidence estimation for clinical question answering. Experiments on MIMIC-III and eICU datasets show TrustEHRAgent outperforms baselines under strict reliability constraints, achieving improvements of 44.23%p and 25.34%p at HCAcc@70% while baseline methods fail at these thresholds. These results highlight limitations of traditional accuracy metrics in evaluating healthcare AI agents. Our work contributes to developing trustworthy clinical agents that deliver accurate information or transparently express uncertainty when confidence is low.
Problem

Research questions and friction points this paper is trying to address.

Addressing hallucination risks in LLMs for EHR analysis
Proposing confidence-aware metrics for clinical decision reliability
Developing trustworthy agents for accurate healthcare information extraction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Stepwise confidence estimation for clinical QA
Novel HCAcc@k% metric for reliability-accuracy tradeoff
Confidence-aware agent outperforms baselines under strict constraints
🔎 Similar Papers
No similar papers found.