🤖 AI Summary
In whole-slide image (WSI) classification, existing methods struggle to effectively model spatial dependencies among tissue patches: multiple instance learning (MIL) ignores structural relationships; graph neural networks (GNNs) often rely on static graph topologies; and conventional attention mechanisms lack spatial specificity. To address these limitations, we propose Deformable Graph MIL (DG-MIL), a novel framework integrating deformable attention with dynamic graph learning. DG-MIL constructs a directed, dynamic graph based on the actual spatial coordinates of pathology patches and introduces a learnable spatial offset mechanism to enable adaptive focus on morphologically relevant regions during graph aggregation. This design overcomes the constraints of fixed graph structures and rigid attention patterns, significantly enhancing spatial contextual modeling. Evaluated on four benchmark datasets—TCGA-COAD, BRACS, Camelyon16, and Camelyon17—DG-MIL achieves state-of-the-art performance, demonstrating its efficacy in capturing complex tissue architectures.
📝 Abstract
Accurate classification of Whole Slide Images (WSIs) and Regions of Interest (ROIs) is a fundamental challenge in computational pathology. While mainstream approaches often adopt Multiple Instance Learning (MIL), they struggle to capture the spatial dependencies among tissue structures. Graph Neural Networks (GNNs) have emerged as a solution to model inter-instance relationships, yet most rely on static graph topologies and overlook the physical spatial positions of tissue patches. Moreover, conventional attention mechanisms lack specificity, limiting their ability to focus on structurally relevant regions. In this work, we propose a novel GNN framework with deformable attention for pathology image analysis. We construct a dynamic weighted directed graph based on patch features, where each node aggregates contextual information from its neighbors via attention-weighted edges. Specifically, we incorporate learnable spatial offsets informed by the real coordinates of each patch, enabling the model to adaptively attend to morphologically relevant regions across the slide. This design significantly enhances the contextual field while preserving spatial specificity. Our framework achieves state-of-the-art performance on four benchmark datasets (TCGA-COAD, BRACS, gastric intestinal metaplasia grading, and intestinal ROI classification), demonstrating the power of deformable attention in capturing complex spatial structures in WSIs and ROIs.