CPathAgent: An Agent-based Foundation Model for Interpretable High-Resolution Pathology Image Analysis Mimicking Pathologists' Diagnostic Logic

📅 2025-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current computational pathology models fail to emulate pathologists’ multi-scale diagnostic reasoning—specifically, the “low-magnification overview to high-magnification focus” workflow—relying instead on single-scale encoding or end-to-end report generation, thereby lacking interpretable, stepwise inference. This paper introduces the first agent-based pathological diagnostic agent that autonomously performs multi-scale zooming, visual navigation, and collaborative reasoning across patch-, region-, and whole-slide levels to mimic real-world slide review. Key innovations include: (1) an agent-driven, multi-stage training paradigm; (2) a unified three-scale modeling framework; (3) a cross-scale feature alignment mechanism; and (4) PathMMU-HR², the first expert-annotated benchmark for high-resolution regional analysis. Our method achieves state-of-the-art performance across patch-level classification, regional diagnosis, and whole-slide interpretation tasks, generating more comprehensive, traceable, and clinically credible diagnostic reports.

Technology Category

Application Category

📝 Abstract
Recent advances in computational pathology have led to the emergence of numerous foundation models. However, these approaches fail to replicate the diagnostic process of pathologists, as they either simply rely on general-purpose encoders with multi-instance learning for classification or directly apply multimodal models to generate reports from images. A significant limitation is their inability to emulate the diagnostic logic employed by pathologists, who systematically examine slides at low magnification for overview before progressively zooming in on suspicious regions to formulate comprehensive diagnoses. To address this gap, we introduce CPathAgent, an innovative agent-based model that mimics pathologists' reasoning processes by autonomously executing zoom-in/out and navigation operations across pathology images based on observed visual features. To achieve this, we develop a multi-stage training strategy unifying patch-level, region-level, and whole-slide capabilities within a single model, which is essential for mimicking pathologists, who require understanding and reasoning capabilities across all three scales. This approach generates substantially more detailed and interpretable diagnostic reports compared to existing methods, particularly for huge region understanding. Additionally, we construct an expert-validated PathMMU-HR$^{2}$, the first benchmark for huge region analysis, a critical intermediate scale between patches and whole slides, as diagnosticians typically examine several key regions rather than entire slides at once. Extensive experiments demonstrate that CPathAgent consistently outperforms existing approaches across three scales of benchmarks, validating the effectiveness of our agent-based diagnostic approach and highlighting a promising direction for the future development of computational pathology.
Problem

Research questions and friction points this paper is trying to address.

Mimic pathologists' diagnostic logic in pathology image analysis
Address inability to emulate systematic zoom-in/out examination
Improve interpretable huge region understanding in computational pathology
Innovation

Methods, ideas, or system contributions that make the work stand out.

Agent-based model mimicking pathologists' diagnostic logic
Multi-stage training for patch, region, and slide analysis
First benchmark for huge region pathology analysis
🔎 Similar Papers
No similar papers found.
Y
Yuxuan Sun
College of Computer Science and Technology, Zhejiang University, China; Research Center for Industries of the Future and School of Engineering, Westlake University, China
Y
Yixuan Si
Research Center for Industries of the Future and School of Engineering, Westlake University, China
C
Chenglu Zhu
Research Center for Industries of the Future and School of Engineering, Westlake University, China
K
Kai Zhang
Department of Computer Science and Engineering, The Ohio State University, USA
Zhongyi Shui
Zhongyi Shui
Ph.D. Candidate,Westlake University & Zhejiang University
B
Bowen Ding
College of Computer Science and Technology, Zhejiang University, China; Research Center for Industries of the Future and School of Engineering, Westlake University, China
T
Tao Lin
Research Center for Industries of the Future and School of Engineering, Westlake University, China
L
Lin Yang
Research Center for Industries of the Future and School of Engineering, Westlake University, China