CuraView: A Multi-Agent Framework for Medical Hallucination Detection with GraphRAG-Enhanced Knowledge Verification

📅 2026-05-05
📈 Citations: 0
Influential: 0
📄 PDF

career value

184K/year
🤖 AI Summary
This study addresses the critical issue of “faithfulness hallucinations” in large language models (LLMs) when generating hospital discharge summaries—errors that contradict electronic health records (EHRs) and jeopardize patient safety. The authors propose the first multi-agent framework integrating GraphRAG with a four-tier evidence classification system (E1–E4), which constructs patient-level knowledge graphs from EHRs to enable sentence-level hallucination detection and generate interpretable evidence chains. Combining multi-agent collaboration, structured evidence retrieval, and a fine-tuned Qwen3-14B model, the approach achieves an F1 score of 0.831 (90.9% recall, 76.5% precision) for detecting E4-class hallucinations on the Discharge-Me test set—representing a 50.0% relative improvement over baseline methods—and contributes a high-quality, reusable annotated dataset.
📝 Abstract
Discharge summaries require extracting critical information from lengthy electronic health records (EHRs), a process that is labor-intensive when performed manually. Large language models (LLMs) can improve generation efficiency; however, they are prone to producing faithfulness hallucinations, statements that contradict source records, posing direct risks to patient safety. To address this, we present CuraView, a multi-agent framework for sentence-level detection and evidence-grounded explanation of faithfulness hallucinations in discharge summaries. CuraView constructs a GraphRAG-based knowledge graph from patient-level EHRs and implements a closed-loop generation-detection pipeline with sentence-level evidence retrieval and classification spanning four evidence grades from strong support to direct contradiction (E1-E4), yielding structured and interpretable evidence chains. We evaluate CuraView on a subset of 250 patients from the Discharge-Me benchmark, with 50 patients held out for testing. Our fine-tuned Qwen3-14B detection model achieves an F1 of 0.831 on the safety-critical E4 metric (90.9% recall, 76.5% precision) and an F1 of 0.823 on E3+E4, representing a 50.0% relative improvement over the base model and outperforming RAGTruth-style and QAGS-style baselines. These results demonstrate that evidence-chain-based graph retrieval verification substantially improves the factual reliability of clinical documentation, while simultaneously producing reusable annotated datasets for downstream model training and distillation.
Problem

Research questions and friction points this paper is trying to address.

medical hallucination
faithfulness
discharge summaries
electronic health records
patient safety
Innovation

Methods, ideas, or system contributions that make the work stand out.

GraphRAG
multi-agent framework
faithfulness hallucination detection
evidence chain
clinical documentation verification
S
Severin Ye
School of Computer Science and Engineering, Kyungpook National University, Daegu, Republic of Korea
X
Xiao Kong
School of Computer Science and Engineering, Kyungpook National University, Daegu, Republic of Korea
X
Xiaopeng He
West China School of Medicine, Sichuan University, Chengdu, China
G
Guangsu Yan
School of Airspace Science and Engineering, Shandong University, Weihai, China
Dongsuk Oh
Dongsuk Oh
Kyungpook National University, Assistant Professor
Natural Language ProcessingSemantics