Measuring and Predicting Where and When Pathologists Focus their Visual Attention while Grading Whole Slide Images of Cancer

๐Ÿ“… 2025-08-03
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This study addresses the problem of predicting pathologistsโ€™ spatiotemporal visual attention distributions while reviewing whole-slide images (WSIs) for cancer diagnosis. To model dynamic scanning trajectories, we propose a two-stage Transformer architecture: the first stage generates multi-scale attention heatmaps, and the second stage autoregressively predicts fixation sequences. We further introduce a semantics-preserving fixation extraction algorithm that jointly captures magnification level, spatial coordinates, and temporal dynamics. The model integrates digital microscope trajectory data with multi-scale histopathological features. Evaluated on 123 WSIs, it significantly outperforms random and baseline methods. This work presents the first end-to-end prediction framework for expert-level WSI scanning paths. It provides a quantifiable, interpretable attention assessment tool for pathology training and advances intelligent systems for diagnostic assistance and medical education.

Technology Category

Application Category

๐Ÿ“ Abstract
The ability to predict the attention of expert pathologists could lead to decision support systems for better pathology training. We developed methods to predict the spatio-temporal (where and when) movements of pathologists' attention as they grade whole slide images (WSIs) of prostate cancer. We characterize a pathologist's attention trajectory by their x, y, and m (magnification) movements of a viewport as they navigate WSIs using a digital microscope. This information was obtained from 43 pathologists across 123 WSIs, and we consider the task of predicting the pathologist attention scanpaths constructed from the viewport centers. We introduce a fixation extraction algorithm that simplifies an attention trajectory by extracting fixations in the pathologist's viewing while preserving semantic information, and we use these pre-processed data to train and test a two-stage model to predict the dynamic (scanpath) allocation of attention during WSI reading via intermediate attention heatmap prediction. In the first stage, a transformer-based sub-network predicts the attention heatmaps (static attention) across different magnifications. In the second stage, we predict the attention scanpath by sequentially modeling the next fixation points in an autoregressive manner using a transformer-based approach, starting at the WSI center and leveraging multi-magnification feature representations from the first stage. Experimental results show that our scanpath prediction model outperforms chance and baseline models. Tools developed from this model could assist pathology trainees in learning to allocate their attention during WSI reading like an expert.
Problem

Research questions and friction points this paper is trying to address.

Predict pathologists' visual attention on cancer slides
Model spatio-temporal focus during slide grading
Improve pathology training via attention prediction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Predict pathologists' attention using spatio-temporal data
Fixation extraction algorithm preserves semantic trajectory information
Two-stage transformer model predicts dynamic attention scanpaths
๐Ÿ”Ž Similar Papers
No similar papers found.
Souradeep Chakraborty
Souradeep Chakraborty
Applied Scientist, Amazon
Computer VisionGenerative AIRecommendation Systems
R
Ruoyu Xue
Department of Computer Science, Stony Brook University, Stony Brook, 11794, NY, USA
Rajarsi Gupta
Rajarsi Gupta
Biomedical Informatics, Stony Brook University
Biomedical Informatics
O
Oksana Yaskiv
Department of Pathology and Laboratory Medicine, Northwell Health Laboratories, Greenvale, 11548, NY, USA
C
Constantin Friedman
Department of Pathology and Laboratory Medicine, Northwell Health Laboratories, Greenvale, 11548, NY, USA
N
Natallia Sheuka
Department of Pathology and Laboratory Medicine, Northwell Health Laboratories, Greenvale, 11548, NY, USA
D
Dana Perez
Department of Pathology and Laboratory Medicine, Northwell Health Laboratories, Greenvale, 11548, NY, USA
P
Paul Friedman
Department of Pathology and Laboratory Medicine, Northwell Health Laboratories, Greenvale, 11548, NY, USA
W
Won-Tak Choi
Department of Pathology, University of California San Francisco, San Francisco, 94143, CA, USA
W
Waqas Mahmud
Department of Biomedical Informatics, Stony Brook University, Stony Brook, 11794, NY, USA
B
Beatrice Knudsen
Department of Pathology, University of Utah School of Medicine, Utah, 84112, NY, USA
Gregory Zelinsky
Gregory Zelinsky
Professor of Psychology and Computer Science, Stony Brook University
visual attentionvisual searchobject detection
Joel Saltz
Joel Saltz
SUNY Distinguished Professor and Chair of Biomedical Informatics, Stony Brook University
High End ComputingSystems SoftwareBiomedical InformaticsPathology Informatics
Dimitris Samaras
Dimitris Samaras
Stony Brook University
Computer VisionMachine LearningComputer GraphicsMedical Imaging