Dialogue to Question Generation for Evidence-based Medical Guideline Agent Development

📅 2026-03-25

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

This study addresses the challenge clinicians face in applying lengthy evidence-based guidelines during fast-paced primary care consultations. To bridge this gap, the authors propose a multi-stage reasoning prompting strategy leveraging the large language model Gemini 2.5 to automatically generate clinically relevant questions—rather than direct answers—that align closely with established guidelines from real patient–physician dialogues. The approach, implemented via zero-shot and multi-stage prompting, was evaluated on 80 authentic consultation transcripts and rigorously assessed by six senior physicians over more than 90 hours. Results demonstrate that the generated questions exhibit strong clinical relevance and guideline adherence, significantly reducing physicians’ cognitive load and enhancing the practical implementation of evidence-based medicine in frontline clinical practice.

Technology Category

Application Category

📝 Abstract

Evidence-based medicine (EBM) is central to high-quality care, but remains difficult to implement in fast-paced primary care settings. Physicians face short consultations, increasing patient loads, and lengthy guideline documents that are impractical to consult in real time. To address this gap, we investigate the feasibility of using large language models (LLMs) as ambient assistants that surface targeted, evidence-based questions during physician-patient encounters. Our study focuses on question generation rather than question answering, with the aim of scaffolding physician reasoning and integrating guideline-based practice into brief consultations. We implemented two prompting strategies, a zero-shot baseline and a multi-stage reasoning variant, using Gemini 2.5 as the backbone model. We evaluated on a benchmark of 80 de-identified transcripts from real clinical encounters, with six experienced physicians contributing over 90 hours of structured review. Results indicate that while general-purpose LLMs are not yet fully reliable, they can produce clinically meaningful and guideline-relevant questions, suggesting significant potential to reduce cognitive burden and make EBM more actionable at the point of care.

Problem

Research questions and friction points this paper is trying to address.

Evidence-based medicine

Question generation

Clinical decision support

Primary care

Physician-patient dialogue

Innovation

Methods, ideas, or system contributions that make the work stand out.

question generation

evidence-based medicine

large language models