Neural at ArchEHR-QA 2026: One Method Fits All: Unified Prompt Optimization for Clinical QA over EHRs

📅 2026-05-11
📈 Citations: 0
Influential: 0
📄 PDF

career value

172K/year
🤖 AI Summary
This work addresses key challenges in automated question answering over electronic health records (EHRs)—including evidence retrieval, answer faithfulness, and clinical grounding—by proposing a modular, fine-tuning-free prompt optimization framework. The approach decomposes clinical QA into four sequential stages: question understanding, evidence identification, answer generation, and evidence alignment. Leveraging DSPy’s MIPROv2 optimizer, the framework automatically discovers high-performing prompts for each stage, while integrating self-consistency voting and stage-specific validation mechanisms to enhance reasoning reliability. Evaluated on the ArchEHR-QA 2026 benchmark across four subtasks, the method achieves an average rank of 4.00 (second overall), with individual rankings of 4th, 1st, 4th, and 7th, demonstrating that sophisticated prompt engineering can effectively and robustly substitute for model fine-tuning in complex clinical QA settings.
📝 Abstract
Automated question answering (QA) over electronic health records (EHRs) demands precise evidence retrieval, faithful answer generation, and explicit grounding of answers in clinical notes. In this work, we present Neural1.5, our method for the ArchEHR-QA 2026 shared task at CL4Health@LREC 2026, which comprises four subtasks: question interpretation, evidence identification, answer generation, and evidence alignment. Our approach decouples the task into independent, modular stages and employs DSPy"s MIPROv2 optimizer to automatically discover high-performing prompts, jointly tuning instructions and few-shot demonstrations for each stage. Within every stage, self-consistency voting over multiple stochastic inference runs suppresses spurious errors and improves reliability, while stage-specific verification mechanisms (e.g., self-reflection and chain-of-verification for alignment) further refine output quality. Among all teams that participated in all four subtasks, our method ranks second overall (mean rank 4.00), placing 4th, 1st, 4th, and 7th on Subtasks 1-4, respectively. These results demonstrate that systematic, per-stage prompt optimization combined with self-consistency mechanisms is a cost-effective alternative to model fine-tuning for multifaceted clinical QA.
Problem

Research questions and friction points this paper is trying to address.

clinical QA
electronic health records
evidence retrieval
answer generation
evidence grounding
Innovation

Methods, ideas, or system contributions that make the work stand out.

prompt optimization
modular QA pipeline
self-consistency voting
evidence alignment
clinical question answering
🔎 Similar Papers
No similar papers found.