Self-Evolving Multi-Agent Simulations for Realistic Clinical Interactions

📅 2025-03-28

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Large language models (LLMs) exhibit inconsistent reasoning and unreliable decision-making in dynamic, multi-turn clinical diagnosis scenarios. Method: We propose MedAgentSim—the first open-source clinical simulation environment enabling collaborative multi-agent interaction among physician, patient, and diagnostic agents. It supports active history-taking, on-demand test ordering (e.g., MRI, blood pressure), and dynamic diagnostic reasoning. Our approach introduces a novel self-evolving diagnostic mechanism integrating multi-agent coordination, chain-of-thought (CoT) reasoning, and retrieval-augmented generation (RAG). We further construct the first fine-grained evaluation benchmark tailored to dynamic clinical dialogues. Contribution/Results: Experiments demonstrate significant improvements in LLM diagnostic accuracy and strategic consistency. The framework supports both fully automated evaluation and human-in-the-loop assessment. All code, tools, and the benchmark are publicly released.

Technology Category

Application Category

📝 Abstract

In this work, we introduce MedAgentSim, an open-source simulated clinical environment with doctor, patient, and measurement agents designed to evaluate and enhance LLM performance in dynamic diagnostic settings. Unlike prior approaches, our framework requires doctor agents to actively engage with patients through multi-turn conversations, requesting relevant medical examinations (e.g., temperature, blood pressure, ECG) and imaging results (e.g., MRI, X-ray) from a measurement agent to mimic the real-world diagnostic process. Additionally, we incorporate self improvement mechanisms that allow models to iteratively refine their diagnostic strategies. We enhance LLM performance in our simulated setting by integrating multi-agent discussions, chain-of-thought reasoning, and experience-based knowledge retrieval, facilitating progressive learning as doctor agents interact with more patients. We also introduce an evaluation benchmark for assessing the LLM's ability to engage in dynamic, context-aware diagnostic interactions. While MedAgentSim is fully automated, it also supports a user-controlled mode, enabling human interaction with either the doctor or patient agent. Comprehensive evaluations in various simulated diagnostic scenarios demonstrate the effectiveness of our approach. Our code, simulation tool, and benchmark are available at href{https://medagentsim.netlify.app/}.

Problem

Research questions and friction points this paper is trying to address.

Simulating realistic clinical interactions for LLM evaluation

Enhancing diagnostic accuracy through multi-agent conversations

Developing automated benchmark for dynamic diagnostic assessments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent simulations for clinical interactions

Self-improving diagnostic strategies via feedback

Integrated multi-agent discussions and reasoning

🔎 Similar Papers

Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents

2024-05-05arXiv.orgCitations: 43

💼 Related Jobs

Machine Learning Engineer - Agentic AI

Apple

Sunnyvale, United States of America

Authors to Follow