🤖 AI Summary
To address the limited reliability of large language models (LLMs) in clinical discharge question answering, this paper proposes ExpRAG, an electronic health record (EHR)-driven experience retrieval-augmented generation framework. Methodologically, ExpRAG introduces a coarse-to-fine two-stage retrieval paradigm: first, an EHR report ranker retrieves semantically similar historical cases; second, an experience extractor distills actionable clinical insights—such as diagnosis rationale, medication justification, and care instructions—from those cases, which are then integrated into a retrieval-augmented generation (RAG) architecture and fine-tuned for clinical QA tasks. As a key contribution, we present DischargeQA, the first publicly available discharge-specific QA benchmark grounded in real-world EHR data. Experimental results demonstrate that ExpRAG achieves an average relative improvement of 5.2% over strong baselines on DischargeQA, significantly outperforming generic text retrieval approaches and empirically validating the efficacy of case-based clinical experience in enhancing core clinical reasoning tasks.
📝 Abstract
To improve the reliability of Large Language Models (LLMs) in clinical applications, retrieval-augmented generation (RAG) is extensively applied to provide factual medical knowledge. However, beyond general medical knowledge from open-ended datasets, clinical case-based knowledge is also critical for effective medical reasoning, as it provides context grounded in real-world patient experiences. Motivated by this, we propose Experience Retrieval Augmentation - ExpRAG framework based on Electronic Health Record (EHR), aiming to offer the relevant context from other patients' discharge reports. ExpRAG performs retrieval through a coarse-to-fine process, utilizing an EHR-based report ranker to efficiently identify similar patients, followed by an experience retriever to extract task-relevant content for enhanced medical reasoning. To evaluate ExpRAG, we introduce DischargeQA, a clinical QA dataset with 1,280 discharge-related questions across diagnosis, medication, and instruction tasks. Each problem is generated using EHR data to ensure realistic and challenging scenarios. Experimental results demonstrate that ExpRAG consistently outperforms a text-based ranker, achieving an average relative improvement of 5.2%, highlighting the importance of case-based knowledge for medical reasoning.