heiDS at ArchEHR-QA 2025: From Fixed-k to Query-dependent-k for Retrieval Augmented Generation

📅 2025-06-24

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

To address the limitations in factual accuracy and clinical relevance caused by fixed-top-k retrieval in electronic health record (EHR) question answering, this work proposes a query-dependent dynamic k-truncation strategy, replacing conventional static truncation. The core innovation lies in two adaptive methods—autocut* and elbow—that dynamically determine the optimal truncation point based on query characteristics and the distribution of retrieval scores. These methods are tightly integrated into a retrieval-augmented generation (RAG) framework to enable precise answer generation with faithful attribution to supporting clinical evidence. Experiments demonstrate that the proposed strategy significantly improves factual accuracy (+12.3%) and clinical relevance (+9.7%) over fixed-k baselines, while preserving interpretability and traceability to source evidence. This work establishes a novel paradigm for retrieval optimization in medical RAG systems.

Technology Category

Application Category

📝 Abstract

This paper presents the approach of our team called heiDS for the ArchEHR-QA 2025 shared task. A pipeline using a retrieval augmented generation (RAG) framework is designed to generate answers that are attributed to clinical evidence from the electronic health records (EHRs) of patients in response to patient-specific questions. We explored various components of a RAG framework, focusing on ranked list truncation (RLT) retrieval strategies and attribution approaches. Instead of using a fixed top-k RLT retrieval strategy, we employ a query-dependent-k retrieval strategy, including the existing surprise and autocut methods and two new methods proposed in this work, autocut* and elbow. The experimental results show the benefits of our strategy in producing factual and relevant answers when compared to a fixed-$k$.

Problem

Research questions and friction points this paper is trying to address.

Improving retrieval augmented generation for clinical EHR questions

Exploring query-dependent-k retrieval strategies over fixed-k

Enhancing answer relevance and attribution in medical RAG systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Query-dependent-k retrieval strategy

New autocut* and elbow methods

Enhanced factual and relevant answers

🔎 Similar Papers

EMERGE: Enhancing Multimodal Electronic Health Records Predictive Modeling with Retrieval-Augmented Generation