PathNavigate: A Training-Free Pathology Agent with Surprise-Guided Scan and Shared Slide Memory for Whole-Slide Image VQA

📅 2026-05-22

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

This work addresses the challenge of efficiently localizing sparse, high-resolution pathological evidence under stringent examination budgets in whole-slide image visual question answering (WSI-VQA). It proposes PathNavigate, a training-free pathological agent that follows a “scan–search–readout” pipeline: first, a low-magnification scan constructs an anomaly-based “surprise field” to delineate suspicious regions; then, guided by question semantics, it selects high-magnification fields of view within these regions to extract evidence and generate answers. The core innovations lie in a surprise-guided scanning mechanism and a shared online memory module, which together enhance the comprehensiveness and interpretability of evidence localization without requiring task-specific training. Experiments demonstrate that PathNavigate achieves superior answer accuracy on both WSI-VQA and the SlideBench-BCNB benchmark while producing efficient, traceable evidence selection trajectories.

📝 Abstract

Whole-slide image visual question answering (WSI-VQA) frames pathology as an extreme-context search problem: to answer a free-form clinical query, a system must first navigate a gigapixel slide under a strict inspection budget to locate sparse, high-resolution evidence. Existing approaches largely fall into two paradigms: i) supervised pathology multimodal large language models (MLLMs) and agents can absorb localization and reasoning into learned modules, but they often couple navigation to task-specific supervision and retraining, limiting their practicality; ii) training-free pathology agents avoid this cost by keeping core models frozen, but often follow a question-first design, constructing the initial candidate set mainly from query-conditioned relevance. This can miss decisive morphology that is not named in the question, and force heavier inference-time scaffolding. To address this challenge, we introduce PathNavigate, a training-free pathology agent built around a scan-search-readout routine. Before question matching, PathNavigate scans the current slide at low magnification with a shared online memory module over frozen pathology features, producing a slide-specific surprise field that marks an abnormal-region pool. It then applies question-conditioned PLIP relevance only within this pool to select high-magnification search targets. Finally, it extracts local high-magnification evidence and answers with a frozen perceptor-adjudicator stack, using the same online memory as slide-level context. Experiments on WSI-VQA and SlideBench-BCNB show that the proposed scan-search-readout design improves answer accuracy and yields more interpretable evidence-selection trajectories with higher efficiency.The code is available online.

Problem

Research questions and friction points this paper is trying to address.

Whole-slide image

Visual question answering

Pathology

Evidence localization

Training-free agent

Innovation

Methods, ideas, or system contributions that make the work stand out.

training-free

surprise-guided scan

shared slide memory