🤖 AI Summary
Emergency medical dispatch (EMD) faces real-world challenges including emotionally distressed callers, ambiguous information, and high cognitive load on dispatchers. To address these, we propose the first high-fidelity EMD simulation system integrating clinical taxonomy—derived from MIMIC-III and comprising a six-stage call protocol plus disease/symptom ontology—with a multi-agent collaborative framework (AutoGen). The system features two specialized agents—Caller and Dispatcher—and is grounded in a fact-enhanced clinical knowledge base to ensure clinically plausible interactions. It supports dispatcher training, protocol evaluation, and real-time decision support. In 100 simulated cases, dispatch effectiveness reached 94% and guidance effectiveness 91%; expert reviewers rated outputs highly for clinical plausibility, neutrality, readability, and politeness. Our key contribution lies in deeply embedding structured clinical taxonomy into the multi-agent collaboration workflow, markedly enhancing simulation fidelity, safety, and clinical credibility.
📝 Abstract
Objective: Emergency medical dispatch (EMD) is a high-stakes process challenged by caller distress, ambiguity, and cognitive load. Large Language Models (LLMs) and Multi-Agent Systems (MAS) offer opportunities to augment dispatchers. This study aimed to develop and evaluate a taxonomy-grounded, LLM-powered multi-agent system for simulating realistic EMD scenarios. Methods: We constructed a clinical taxonomy (32 chief complaints, 6 caller identities from MIMIC-III) and a six-phase call protocol. Using this framework, we developed an AutoGen-based MAS with Caller and Dispatcher Agents. The system grounds interactions in a fact commons to ensure clinical plausibility and mitigate misinformation. We used a hybrid evaluation framework: four physicians assessed 100 simulated cases for "Guidance Efficacy" and "Dispatch Effectiveness," supplemented by automated linguistic analysis (sentiment, readability, politeness). Results: Human evaluation, with substantial inter-rater agreement (Gwe's AC1 > 0.70), confirmed the system's high performance. It demonstrated excellent Dispatch Effectiveness (e.g., 94 % contacting the correct potential other agents) and Guidance Efficacy (advice provided in 91 % of cases), both rated highly by physicians. Algorithmic metrics corroborated these findings, indicating a predominantly neutral affective profile (73.7 % neutral sentiment; 90.4 % neutral emotion), high readability (Flesch 80.9), and a consistently polite style (60.0 % polite; 0 % impolite). Conclusion: Our taxonomy-grounded MAS simulates diverse, clinically plausible dispatch scenarios with high fidelity. Findings support its use for dispatcher training, protocol evaluation, and as a foundation for real-time decision support. This work outlines a pathway for safely integrating advanced AI agents into emergency response workflows.