UTSA-NLP at ArchEHR-QA 2025: Improving EHR Question Answering via Self-Consistency Prompting

📅 2025-06-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of improving answer accuracy in clinical question answering over electronic health records (EHRs), where precise retrieval of contextually relevant sentences and generation of concise, traceable answers are critical. We propose a two-stage LLM pipeline: (1) fine-grained sentence retrieval via few-shot prompting and self-consistent sampling, augmented by a dynamic-threshold sentence classification mechanism; and (2) answer generation conditioned on retrieved sentences. A key empirical finding is that an 8B-parameter model substantially outperforms a 70B-parameter model on EHR sentence-level retrieval—highlighting that retrieval precision, rather than model scale, governs downstream answer quality. Our approach achieves state-of-the-art performance on the ArchEHR-QA 2025 benchmark, demonstrating that lightweight models combined with controllable, reasoning-aware retrieval strategies yield superior efficacy and practicality in medical-domain QA.

Technology Category

Application Category

📝 Abstract
We describe our system for the ArchEHR-QA Shared Task on answering clinical questions using electronic health records (EHRs). Our approach uses large language models in two steps: first, to find sentences in the EHR relevant to a clinician's question, and second, to generate a short, citation-supported response based on those sentences. We use few-shot prompting, self-consistency, and thresholding to improve the sentence classification step to decide which sentences are essential. We compare several models and find that a smaller 8B model performs better than a larger 70B model for identifying relevant information. Our results show that accurate sentence selection is critical for generating high-quality responses and that self-consistency with thresholding helps make these decisions more reliable.
Problem

Research questions and friction points this paper is trying to address.

Improving clinical question answering using EHRs
Enhancing sentence relevance classification in EHRs
Optimizing model performance for accurate EHR responses
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-step LLM process for EHR question answering
Self-consistency and thresholding improve sentence selection
Smaller 8B model outperforms larger 70B model
🔎 Similar Papers
No similar papers found.
S
Sara Shields-Menard
The University of Texas at San Antonio
Z
Zach Reimers
The University of Texas at San Antonio
J
Joshua Gardner
The University of Texas at San Antonio
David Perry
David Perry
The University of Texas at San Antonio
Anthony Rios
Anthony Rios
Associate Professor in Information Systems and Cyber Security
Natural Language ProcessingBiomedical InformaticsComputational Social ScienceSocial Computing