DischargeSim: A Simulation Benchmark for Educational Doctor-Patient Communication at Discharge

📅 2025-09-08

📈 Citations: 0

✨ Influential: 0

career value

157K/year

🤖 AI Summary

Existing LLM benchmarks predominantly focus on in-hospital diagnostic reasoning, neglecting post-discharge patient education—a critical component of care continuity. Method: We introduce the first systematic benchmark for discharge communication, featuring multi-turn, personalized dialogues between DoctorAgent and PatientAgent to simulate diverse clinical scenarios and patient profiles. Our evaluation framework incorporates structured health document generation, AHRQ guideline compliance checking, LLM-as-judge assessment, and multiple-choice comprehension testing to quantify performance across multiple dimensions. Contribution/Results: Experiments on 18 state-of-the-art LLMs reveal no significant positive correlation between model scale and educational effectiveness, exposing a fundamental trade-off between content prioritization and communicative strategy application. These findings highlight structural limitations of current LLMs in delivering personalized, clinically appropriate discharge instructions—underscoring the need for task-specific architectural and training innovations.

Technology Category

Application Category

📝 Abstract

Discharge communication is a critical yet underexplored component of patient care, where the goal shifts from diagnosis to education. While recent large language model (LLM) benchmarks emphasize in-visit diagnostic reasoning, they fail to evaluate models' ability to support patients after the visit. We introduce DischargeSim, a novel benchmark that evaluates LLMs on their ability to act as personalized discharge educators. DischargeSim simulates post-visit, multi-turn conversations between LLM-driven DoctorAgents and PatientAgents with diverse psychosocial profiles (e.g., health literacy, education, emotion). Interactions are structured across six clinically grounded discharge topics and assessed along three axes: (1) dialogue quality via automatic and LLM-as-judge evaluation, (2) personalized document generation including free-text summaries and structured AHRQ checklists, and (3) patient comprehension through a downstream multiple-choice exam. Experiments across 18 LLMs reveal significant gaps in discharge education capability, with performance varying widely across patient profiles. Notably, model size does not always yield better education outcomes, highlighting trade-offs in strategy use and content prioritization. DischargeSim offers a first step toward benchmarking LLMs in post-visit clinical education and promoting equitable, personalized patient support.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' ability to support post-visit patient education

Assessing personalized discharge communication across diverse patient profiles

Benchmarking clinical education capabilities beyond diagnostic reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Simulates multi-turn doctor-patient discharge conversations

Evaluates LLMs on personalized document generation

Assesses patient comprehension via multiple-choice exams

🔎 Similar Papers

No similar papers found.