UTSA-NLP at ArchEHR-QA 2025: Improving EHR Question Answering via Self-Consistency Prompting

📅 2025-06-05

📈 Citations: 0

✨ Influential: 0

career value

157K/year

🤖 AI Summary

This study addresses the challenge of improving answer accuracy in clinical question answering over electronic health records (EHRs), where precise retrieval of contextually relevant sentences and generation of concise, traceable answers are critical. We propose a two-stage LLM pipeline: (1) fine-grained sentence retrieval via few-shot prompting and self-consistent sampling, augmented by a dynamic-threshold sentence classification mechanism; and (2) answer generation conditioned on retrieved sentences. A key empirical finding is that an 8B-parameter model substantially outperforms a 70B-parameter model on EHR sentence-level retrieval—highlighting that retrieval precision, rather than model scale, governs downstream answer quality. Our approach achieves state-of-the-art performance on the ArchEHR-QA 2025 benchmark, demonstrating that lightweight models combined with controllable, reasoning-aware retrieval strategies yield superior efficacy and practicality in medical-domain QA.

Technology Category

Application Category

📝 Abstract

We describe our system for the ArchEHR-QA Shared Task on answering clinical questions using electronic health records (EHRs). Our approach uses large language models in two steps: first, to find sentences in the EHR relevant to a clinician's question, and second, to generate a short, citation-supported response based on those sentences. We use few-shot prompting, self-consistency, and thresholding to improve the sentence classification step to decide which sentences are essential. We compare several models and find that a smaller 8B model performs better than a larger 70B model for identifying relevant information. Our results show that accurate sentence selection is critical for generating high-quality responses and that self-consistency with thresholding helps make these decisions more reliable.

Problem

Research questions and friction points this paper is trying to address.

Improving clinical question answering using EHRs

Enhancing sentence relevance classification in EHRs

Optimizing model performance for accurate EHR responses

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-step LLM process for EHR question answering

Self-consistency and thresholding improve sentence selection

Smaller 8B model outperforms larger 70B model

🔎 Similar Papers

No similar papers found.