PathReasoning: A multimodal reasoning agent for query-based ROI navigation on whole-slide images

📅 2025-11-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of navigating clinically relevant regions of interest (ROIs) in whole-slide images (WSIs) due to their massive data volume, this paper proposes a question-driven, interpretable multi-scale WSI navigation method. Methodologically, it integrates multimodal large language models with vision–text alignment reasoning and introduces a multi-round iterative self-reflection mechanism to construct an auditable reasoning chain, enabling clinical-question-guided attention localization without dense pixel-level annotations. The key contribution lies in emulating pathologists’ visual reasoning—balancing computational efficiency, interpretability, and traceability of diagnostic evidence. Experiments demonstrate that the method improves AUROC by 6.7% on histological subtype classification and by 3.1% on longitudinal analysis tasks. Furthermore, diagnostic reports generated from the identified ROIs achieve a 10% higher accuracy than those produced by GPT-4o for breast cancer diagnosis.

Technology Category

Application Category

📝 Abstract
Deciphering tumor microenvironment from Whole Slide Images (WSIs) is intriguing as it is key to cancer diagnosis, prognosis and treatment response. While these gigapixel images on one hand offer a comprehensive portrait of cancer, on the other hand, the extremely large size, as much as more than 10 billion pixels, make it challenging and time-consuming to navigate to corresponding regions to support diverse clinical inspection. Inspired by pathologists who conducted navigation on WSIs with a combination of sampling, reasoning and self-reflection, we proposed "PathReasoning", a multi-modal reasoning agent that iteratively navigates across WSIs through multiple rounds of reasoning and refinements. Specifically, starting with randomly sampled candidate regions, PathReasoning reviews current selections with self-reflection, reasoning over the correspondence between visual observations and clinical questions, and concludes by proposing new regions to explore. Across rounds, PathReasoning builds a reasoning chain that gradually directs attention to diagnostically relevant areas. PathReasoning turns each whole slide into a sequence of question-guided views, allowing the model to efficiently find informative ROIs within a fixed number of steps, without the need for dense pixel-level annotations. PathReasoning can substantially outperform strong ROI-selection approaches by 6.7% and 3.1% of AUROC on subtyping and longitudinal analysis tasks. The high-quality ROIs further support accurate report generation on breast cancer, significantly outperforming the standard GPT-4o by 10% in accuracy. PathReasoning prioritizes question-specific regions and constructs interpretable reasoning chains, supporting efficient slide review, consistent diagnostic interpretations, comprehensive reporting, and evidence traceability in digital pathology.
Problem

Research questions and friction points this paper is trying to address.

Navigating large whole-slide images to find diagnostically relevant regions
Reducing time and effort for clinical inspection of tumor microenvironment
Improving accuracy in cancer subtyping and report generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal reasoning agent iteratively navigates WSIs
Builds reasoning chain to direct attention to relevant areas
Converts whole slides into question-guided view sequences
🔎 Similar Papers
Kunpeng Zhang
Kunpeng Zhang
HKUST
FuzzingSoftware Testing
Hanwen Xu
Hanwen Xu
School of Computer Science and Engineering, University of Washington, Seattle, WA
S
Sheng Wang
School of Computer Science and Engineering, University of Washington, Seattle, WA