MACD: Multi-Agent Clinical Diagnosis with Self-Learned Knowledge for LLM

📅 2025-09-24

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

To address the limitations of large language models (LLMs) in complex clinical diagnosis—namely, isolated reasoning and non-reusable experiential knowledge—this paper proposes a self-evolving multi-agent clinical diagnostic framework. The framework employs an iterative, human-in-the-loop process involving diagnostic agents, evaluation agents, and domain experts to enable autonomous clinical knowledge distillation and continuous knowledge accumulation—achieving cross-model generalizability, transferability, and personalization. Built upon open-source models including Llama-3.1 and DeepSeek-R1-Distill-Llama, the system supports traceable reasoning and human-AI collaborative decision-making. Experiments on 4,390 real-world cases spanning seven disease categories demonstrate that the framework achieves up to a 22.3% improvement in primary diagnosis accuracy over standard clinical guidelines, outperforms physician-only diagnosis by 16%, and delivers an 18.6% gain under human-AI collaboration—significantly enhancing diagnostic robustness and interpretability.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have demonstrated notable potential in medical applications, yet they face substantial challenges in handling complex real-world clinical diagnoses using conventional prompting methods. Current prompt engineering and multi-agent approaches typically optimize isolated inferences, neglecting the accumulation of reusable clinical experience. To address this, this study proposes a novel Multi-Agent Clinical Diagnosis (MACD) framework, which allows LLMs to self-learn clinical knowledge via a multi-agent pipeline that summarizes, refines, and applies diagnostic insights. It mirrors how physicians develop expertise through experience, enabling more focused and accurate diagnosis on key disease-specific cues. We further extend it to a MACD-human collaborative workflow, where multiple LLM-based diagnostician agents engage in iterative consultations, supported by an evaluator agent and human oversight for cases where agreement is not reached. Evaluated on 4,390 real-world patient cases across seven diseases using diverse open-source LLMs (Llama-3.1 8B/70B, DeepSeek-R1-Distill-Llama 70B), MACD significantly improves primary diagnostic accuracy, outperforming established clinical guidelines with gains up to 22.3% (MACD). On the subset of the data, it achieves performance on par with or exceeding that of human physicians (up to 16% improvement over physicians-only diagnosis). Additionally, on the MACD-human workflow, it achieves an 18.6% improvement compared to physicians-only diagnosis. Moreover, self-learned knowledge exhibits strong cross-model stability, transferability, and model-specific personalization, while the system can generate traceable rationales, enhancing explainability. Consequently, this work presents a scalable self-learning paradigm for LLM-assisted diagnosis, bridging the gap between the intrinsic knowledge of LLMs and real-world clinical practice.

Problem

Research questions and friction points this paper is trying to address.

LLMs struggle with complex clinical diagnoses using standard prompting methods

Current approaches neglect accumulation of reusable clinical experience

Need to bridge gap between LLM knowledge and real-world clinical practice

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent framework for self-learning clinical knowledge

Iterative consultations with evaluator agent and human oversight

Generates traceable rationales for explainable diagnosis

🔎 Similar Papers

No similar papers found.