WSI-Agents: A Collaborative Multi-Agent System for Multi-Modal Whole Slide Image Analysis

📅 2025-07-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current multimodal large language models (MLLMs) exhibit insufficient accuracy in whole-slide image (WSI)–based multi-task pathological analysis and lack pathology-aware collaborative agent mechanisms. To address this, we propose the first collaborative multi-agent system specifically designed for WSIs, featuring a three-stage framework: task allocation, consistency verification, and integrated summarization. The system synergistically integrates MLLMs, a domain-specific knowledge base, and a model library, augmented with internal logical consistency checking, external pathological knowledge validation, and visualization-based explanation graph generation. Evaluated on a multimodal WSI benchmark, our system significantly outperforms state-of-the-art MLLMs and medical agent approaches across classification, localization, and report generation tasks—achieving an average accuracy gain of +5.2% while simultaneously enhancing clinical interpretability. This work establishes a novel paradigm for generalizable, precise, and trustworthy AI-assisted pathological diagnosis.

Technology Category

Application Category

📝 Abstract
Whole slide images (WSIs) are vital in digital pathology, enabling gigapixel tissue analysis across various pathological tasks. While recent advancements in multi-modal large language models (MLLMs) allow multi-task WSI analysis through natural language, they often underperform compared to task-specific models. Collaborative multi-agent systems have emerged as a promising solution to balance versatility and accuracy in healthcare, yet their potential remains underexplored in pathology-specific domains. To address these issues, we propose WSI-Agents, a novel collaborative multi-agent system for multi-modal WSI analysis. WSI-Agents integrates specialized functional agents with robust task allocation and verification mechanisms to enhance both task-specific accuracy and multi-task versatility through three components: (1) a task allocation module assigning tasks to expert agents using a model zoo of patch and WSI level MLLMs, (2) a verification mechanism ensuring accuracy through internal consistency checks and external validation using pathology knowledge bases and domain-specific models, and (3) a summary module synthesizing the final summary with visual interpretation maps. Extensive experiments on multi-modal WSI benchmarks show WSI-Agents's superiority to current WSI MLLMs and medical agent frameworks across diverse tasks.
Problem

Research questions and friction points this paper is trying to address.

Enhances multi-task WSI analysis accuracy and versatility
Integrates specialized agents for improved pathology-specific performance
Ensures accuracy via verification and task allocation mechanisms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Collaborative multi-agent system for WSI analysis
Task allocation using patch and WSI level MLLMs
Verification via pathology knowledge and domain models
🔎 Similar Papers
No similar papers found.
Xinheng Lyu
Xinheng Lyu
PhD student, University of Nottingham Ningbo & Shenzhen University
Medical ImageComputational Pathology
Y
Yuci Liang
College of Computer Science and Software Engineering, Shenzhen University, China
W
Wenting Chen
Department of Electrical Engineering, City University of Hong Kong, Hong Kong
Meidan Ding
Meidan Ding
Shenzhen university
computer visionmedical image analysis
J
Jiaqi Yang
School of Computer Science, University of Nottingham Ningbo China, China
G
Guolin Huang
College of Computer Science and Software Engineering, Shenzhen University, China; Wuyi University, China
Daokun Zhang
Daokun Zhang
University of Nottingham Ningbo China
Graph LearningData MiningMachine Learning
Xiangjian He
Xiangjian He
University of Nottingham Ningbo China (2022.5--), University of Technology Sydney (1999.2-2022.5)
Computer VisionMachine LearningData Analytics
Linlin Shen
Linlin Shen
Shenzhen University
Deep LearningComputer VisionFacial Analysis/RecognitionMedical Image Analysis