Spatial Blindness in Whole-Slide Multiple Instance Learning

๐Ÿ“… 2026-05-17
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

194K/year
๐Ÿค– AI Summary
Existing whole-slide image (WSI) multiple instance learning (MIL) models exhibit โ€œspatial blindnessโ€ in histopathological tasks that rely on tissue spatial structure, as they are insensitive to patch coordinate permutations and thus fail to effectively leverage spatial information. To address this limitation, this work proposes ResTopoMIL, a method employing a two-stage training strategy to decouple appearance and spatial learning: first, a permutation-invariant prototype histogram is fitted and frozen; subsequently, a lightweight graph neural network learns spatial residuals under coordinate perturbation constraints to restore sensitivity to spatial structure. With only 1.15 million parameters, ResTopoMIL achieves significant performance gains in both classification and survival prediction across nine public WSI benchmarks and provides stronger localization evidence on CAMELYON-16, marking the first approach to attain efficient spatial awareness while maintaining architectural simplicity.
๐Ÿ“ Abstract
Whole-slide MIL models are often called context-aware once graphs, Transform ers, or state-space modules are placed above patch embeddings. We show that this label can be deceptive. On pathology tasks where tissue architecture is part of the diagnostic signal, several strong MIL baselines retain nearly unchanged slide level AUC after patch coordinates are permuted. Their predictions are accurate, but largely compositional. We refer to this failure mode as spatial blindness. Our explanation is optimization-based: dense appearance statistics are learned early under slide-level supervision, leaving weak gradients for sparse spatial relations. ResTopoMIL addresses the issue by first fitting a permutation-invariant prototype histogram and then freezing it while a lightweight graph branch learns the residual under a coordinate-shuffling constraint. The architecture is simple by design; the intervention is in how the spatial branch is trained. Across 9 public WSI bench marks, ResTopoMIL improves classification and survival prediction with 1.15M parameters, restores sensitivity to coordinate perturbation, and gives stronger lo calization evidence on CAMELYON-16.
Problem

Research questions and friction points this paper is trying to address.

spatial blindness
whole-slide imaging
multiple instance learning
tissue architecture
context-aware
Innovation

Methods, ideas, or system contributions that make the work stand out.

spatial blindness
multiple instance learning
whole-slide image
ResTopoMIL
coordinate-shuffling constraint