Self-Evolving Multi-Agent Simulations for Realistic Clinical Interactions

๐Ÿ“… 2025-03-28
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Large language models (LLMs) exhibit inconsistent reasoning and unreliable decision-making in dynamic, multi-turn clinical diagnosis scenarios. Method: We propose MedAgentSimโ€”the first open-source clinical simulation environment enabling collaborative multi-agent interaction among physician, patient, and diagnostic agents. It supports active history-taking, on-demand test ordering (e.g., MRI, blood pressure), and dynamic diagnostic reasoning. Our approach introduces a novel self-evolving diagnostic mechanism integrating multi-agent coordination, chain-of-thought (CoT) reasoning, and retrieval-augmented generation (RAG). We further construct the first fine-grained evaluation benchmark tailored to dynamic clinical dialogues. Contribution/Results: Experiments demonstrate significant improvements in LLM diagnostic accuracy and strategic consistency. The framework supports both fully automated evaluation and human-in-the-loop assessment. All code, tools, and the benchmark are publicly released.

Technology Category

Application Category

๐Ÿ“ Abstract
In this work, we introduce MedAgentSim, an open-source simulated clinical environment with doctor, patient, and measurement agents designed to evaluate and enhance LLM performance in dynamic diagnostic settings. Unlike prior approaches, our framework requires doctor agents to actively engage with patients through multi-turn conversations, requesting relevant medical examinations (e.g., temperature, blood pressure, ECG) and imaging results (e.g., MRI, X-ray) from a measurement agent to mimic the real-world diagnostic process. Additionally, we incorporate self improvement mechanisms that allow models to iteratively refine their diagnostic strategies. We enhance LLM performance in our simulated setting by integrating multi-agent discussions, chain-of-thought reasoning, and experience-based knowledge retrieval, facilitating progressive learning as doctor agents interact with more patients. We also introduce an evaluation benchmark for assessing the LLM's ability to engage in dynamic, context-aware diagnostic interactions. While MedAgentSim is fully automated, it also supports a user-controlled mode, enabling human interaction with either the doctor or patient agent. Comprehensive evaluations in various simulated diagnostic scenarios demonstrate the effectiveness of our approach. Our code, simulation tool, and benchmark are available at href{https://medagentsim.netlify.app/}.
Problem

Research questions and friction points this paper is trying to address.

Simulating realistic clinical interactions for LLM evaluation
Enhancing diagnostic accuracy through multi-agent conversations
Developing automated benchmark for dynamic diagnostic assessments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent simulations for clinical interactions
Self-improving diagnostic strategies via feedback
Integrated multi-agent discussions and reasoning
๐Ÿ”Ž Similar Papers