🤖 AI Summary
To address the challenge of navigating clinically relevant regions of interest (ROIs) in whole-slide images (WSIs) due to their massive data volume, this paper proposes a question-driven, interpretable multi-scale WSI navigation method. Methodologically, it integrates multimodal large language models with vision–text alignment reasoning and introduces a multi-round iterative self-reflection mechanism to construct an auditable reasoning chain, enabling clinical-question-guided attention localization without dense pixel-level annotations. The key contribution lies in emulating pathologists’ visual reasoning—balancing computational efficiency, interpretability, and traceability of diagnostic evidence. Experiments demonstrate that the method improves AUROC by 6.7% on histological subtype classification and by 3.1% on longitudinal analysis tasks. Furthermore, diagnostic reports generated from the identified ROIs achieve a 10% higher accuracy than those produced by GPT-4o for breast cancer diagnosis.
📝 Abstract
Deciphering tumor microenvironment from Whole Slide Images (WSIs) is intriguing as it is key to cancer diagnosis, prognosis and treatment response. While these gigapixel images on one hand offer a comprehensive portrait of cancer, on the other hand, the extremely large size, as much as more than 10 billion pixels, make it challenging and time-consuming to navigate to corresponding regions to support diverse clinical inspection. Inspired by pathologists who conducted navigation on WSIs with a combination of sampling, reasoning and self-reflection, we proposed "PathReasoning", a multi-modal reasoning agent that iteratively navigates across WSIs through multiple rounds of reasoning and refinements. Specifically, starting with randomly sampled candidate regions, PathReasoning reviews current selections with self-reflection, reasoning over the correspondence between visual observations and clinical questions, and concludes by proposing new regions to explore. Across rounds, PathReasoning builds a reasoning chain that gradually directs attention to diagnostically relevant areas. PathReasoning turns each whole slide into a sequence of question-guided views, allowing the model to efficiently find informative ROIs within a fixed number of steps, without the need for dense pixel-level annotations. PathReasoning can substantially outperform strong ROI-selection approaches by 6.7% and 3.1% of AUROC on subtyping and longitudinal analysis tasks. The high-quality ROIs further support accurate report generation on breast cancer, significantly outperforming the standard GPT-4o by 10% in accuracy. PathReasoning prioritizes question-specific regions and constructs interpretable reasoning chains, supporting efficient slide review, consistent diagnostic interpretations, comprehensive reporting, and evidence traceability in digital pathology.