๐ค AI Summary
Large language models (LLMs) exhibit inconsistent reasoning and unreliable decision-making in dynamic, multi-turn clinical diagnosis scenarios. Method: We propose MedAgentSimโthe first open-source clinical simulation environment enabling collaborative multi-agent interaction among physician, patient, and diagnostic agents. It supports active history-taking, on-demand test ordering (e.g., MRI, blood pressure), and dynamic diagnostic reasoning. Our approach introduces a novel self-evolving diagnostic mechanism integrating multi-agent coordination, chain-of-thought (CoT) reasoning, and retrieval-augmented generation (RAG). We further construct the first fine-grained evaluation benchmark tailored to dynamic clinical dialogues. Contribution/Results: Experiments demonstrate significant improvements in LLM diagnostic accuracy and strategic consistency. The framework supports both fully automated evaluation and human-in-the-loop assessment. All code, tools, and the benchmark are publicly released.
๐ Abstract
In this work, we introduce MedAgentSim, an open-source simulated clinical environment with doctor, patient, and measurement agents designed to evaluate and enhance LLM performance in dynamic diagnostic settings. Unlike prior approaches, our framework requires doctor agents to actively engage with patients through multi-turn conversations, requesting relevant medical examinations (e.g., temperature, blood pressure, ECG) and imaging results (e.g., MRI, X-ray) from a measurement agent to mimic the real-world diagnostic process. Additionally, we incorporate self improvement mechanisms that allow models to iteratively refine their diagnostic strategies. We enhance LLM performance in our simulated setting by integrating multi-agent discussions, chain-of-thought reasoning, and experience-based knowledge retrieval, facilitating progressive learning as doctor agents interact with more patients. We also introduce an evaluation benchmark for assessing the LLM's ability to engage in dynamic, context-aware diagnostic interactions. While MedAgentSim is fully automated, it also supports a user-controlled mode, enabling human interaction with either the doctor or patient agent. Comprehensive evaluations in various simulated diagnostic scenarios demonstrate the effectiveness of our approach. Our code, simulation tool, and benchmark are available at href{https://medagentsim.netlify.app/}.