🤖 AI Summary
This work addresses the challenge of modeling long-term structured electronic health records (EHRs), which often exceed the context window of large language models, leading conventional truncation or retrieval methods to overlook critical clinical events and their temporal dependencies. To overcome this limitation, the authors propose EHR-RAG, a novel framework that integrates event-time joint modeling, adaptive iterative query refinement, and counterfactual evidence reasoning within a retrieval-augmented generation (RAG) architecture. By combining hybrid retrieval with dual-path factual and counterfactual reasoning, EHR-RAG effectively preserves the longitudinal structure and dynamic nature of EHR data. Evaluated on four long-term clinical prediction tasks, the method achieves an average Macro-F1 improvement of 10.76% over the strongest existing large language model baselines, demonstrating its superior capability in capturing clinically relevant temporal patterns.
📝 Abstract
Electronic Health Records (EHRs) provide rich longitudinal clinical evidence that is central to medical decision-making, motivating the use of retrieval-augmented generation (RAG) to ground large language model (LLM) predictions. However, long-horizon EHRs often exceed LLM context limits, and existing approaches commonly rely on truncation or vanilla retrieval strategies that discard clinically relevant events and temporal dependencies. To address these challenges, we propose EHR-RAG, a retrieval-augmented framework designed for accurate interpretation of long-horizon structured EHR data. EHR-RAG introduces three components tailored to longitudinal clinical prediction tasks: Event- and Time-Aware Hybrid EHR Retrieval to preserve clinical structure and temporal dynamics, Adaptive Iterative Retrieval to progressively refine queries in order to expand broad evidence coverage, and Dual-Path Evidence Retrieval and Reasoning to jointly retrieves and reasons over both factual and counterfactual evidence. Experiments across four long-horizon EHR prediction tasks show that EHR-RAG consistently outperforms the strongest LLM-based baselines, achieving an average Macro-F1 improvement of 10.76%. Overall, our work highlights the potential of retrieval-augmented LLMs to advance clinical prediction on structured EHR data in practice.