Pathology-CoT: Learning Visual Chain-of-Thought Agent from Expert Whole Slide Image Diagnosis Behavior

📅 2025-10-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current pathology AI lacks interpretable and scalable whole-slide image (WSI) diagnostic agents, primarily because clinicians’ implicit reading behaviors—such as region selection, zoom-level decisions, and reasoning chains—are difficult to structure, capture, and model. Method: We introduce Pathology-CoT, a novel dataset, and an AI Conversation Recorder that transforms real expert browsing logs into standardized, human-AI co-verified behavioral supervision signals, enabling visual-chain reasoning with minimal annotation cost. Our approach comprises behavioral log standardization, a two-stage agent architecture (region proposal followed by behavior-guided reasoning), and lightweight human verification. Contribution/Results: Evaluated on gastrointestinal lymph node metastasis detection, our system achieves 84.5% precision, 100.0% recall, and 75.4% accuracy—substantially outperforming OpenAI o3—and demonstrates strong generalization across diverse backbone networks.

Technology Category

Application Category

📝 Abstract
Diagnosing a whole-slide image is an interactive, multi-stage process involving changes in magnification and movement between fields. Although recent pathology foundation models are strong, practical agentic systems that decide what field to examine next, adjust magnification, and deliver explainable diagnoses are still lacking. The blocker is data: scalable, clinically aligned supervision of expert viewing behavior that is tacit and experience-based, not written in textbooks or online, and therefore absent from large language model training. We introduce the AI Session Recorder, which works with standard WSI viewers to unobtrusively record routine navigation and convert the viewer logs into standardized behavioral commands (inspect or peek at discrete magnifications) and bounding boxes. A lightweight human-in-the-loop review turns AI-drafted rationales into the Pathology-CoT dataset, a form of paired "where to look" and "why it matters" supervision produced at roughly six times lower labeling time. Using this behavioral data, we build Pathologist-o3, a two-stage agent that first proposes regions of interest and then performs behavior-guided reasoning. On gastrointestinal lymph-node metastasis detection, it achieved 84.5% precision, 100.0% recall, and 75.4% accuracy, exceeding the state-of-the-art OpenAI o3 model and generalizing across backbones. To our knowledge, this constitutes one of the first behavior-grounded agentic systems in pathology. Turning everyday viewer logs into scalable, expert-validated supervision, our framework makes agentic pathology practical and establishes a path to human-aligned, upgradeable clinical AI.
Problem

Research questions and friction points this paper is trying to address.

Capturing expert diagnostic behavior from slide navigation for AI training
Developing agentic systems that autonomously examine pathology slides
Creating scalable supervision data from routine clinical viewing patterns
Innovation

Methods, ideas, or system contributions that make the work stand out.

Recording expert navigation behavior in WSI viewers
Converting viewer logs into standardized behavioral commands
Building two-stage agent for region proposal and reasoning
🔎 Similar Papers
No similar papers found.
S
Sheng Wang
Department of Pathology and Laboratory Medicine, University of Pennsylvania
R
Ruiming Wu
Department of Biostatistics, Epidemiology & Informatics, University of Pennsylvania
C
Charles Herndon
Department of Pathology, University of California at San Francisco
Y
Yihang Liu
Department of Electrical and System Engineering, University of Pennsylvania
S
Shunsuke Koga
Department of Pathology and Laboratory Medicine, University of Pennsylvania
J
Jeanne Shen
Department of Pathology, Stanford University
Zhi Huang
Zhi Huang
Assistant Professor, University of Pennsylvania
Biomedical Data ScienceAIComputational Pathology