VeriFact: Verifying Facts in LLM-Generated Clinical Text with Electronic Health Records

📅 2025-01-28

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

To address the lack of automated factual verification for clinical text generated by large language models (LLMs), this paper proposes the first EHR-driven, two-stage validation framework that integrates retrieval-augmented generation (RAG) with the LLM-as-a-judge paradigm to enable fine-grained semantic alignment and clinical fidelity assessment against patients’ real-world electronic health records (EHRs). We introduce VeriFact-BHC—the first clinical fact verification benchmark annotated with EHR-supported evidence—and incorporate clinical NLP, EHR structuring, and semantic mapping techniques. Evaluated on VeriFact-BHC, our method achieves 92.7% inter-rater agreement with human clinicians, significantly surpassing the average clinician performance of 88.5%. This advancement directly addresses a critical bottleneck in trustworthy LLM evaluation within clinical settings.

Technology Category

Application Category

📝 Abstract

Methods to ensure factual accuracy of text generated by large language models (LLM) in clinical medicine are lacking. VeriFact is an artificial intelligence system that combines retrieval-augmented generation and LLM-as-a-Judge to verify whether LLM-generated text is factually supported by a patient's medical history based on their electronic health record (EHR). To evaluate this system, we introduce VeriFact-BHC, a new dataset that decomposes Brief Hospital Course narratives from discharge summaries into a set of simple statements with clinician annotations for whether each statement is supported by the patient's EHR clinical notes. Whereas highest agreement between clinicians was 88.5%, VeriFact achieves up to 92.7% agreement when compared to a denoised and adjudicated average human clinican ground truth, suggesting that VeriFact exceeds the average clinician's ability to fact-check text against a patient's medical record. VeriFact may accelerate the development of LLM-based EHR applications by removing current evaluation bottlenecks.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Medical Article Generation

Electronic Health Records Accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

VeriFact

VeriFact-BHC

EHR processing

🔎 Similar Papers

Factual consistency evaluation of summarization in the Era of large language models