🤖 AI Summary
To address the challenges of modeling instance spatial relationships, high computational complexity in long-sequence dependency modeling, and inconsistent sequence lengths between training and inference in whole-slide image (WSI) classification, this paper proposes an efficient and scalable multiple-instance learning (MIL) framework. Our method explicitly encodes the intrinsic spatial layout of instances via learnable coordinate-based positional embeddings. Furthermore, we introduce an MLP-based Spatial Attention Correlation (SAC) module that captures global instance dependencies in linear time complexity—eliminating the need for custom CUDA kernels while ensuring both high performance and deployment efficiency. Evaluated on CAMELYON16, TCGA-LUNG, and TCGA-BRCA benchmarks, our approach achieves state-of-the-art classification accuracy, significantly outperforming existing MIL methods.
📝 Abstract
We propose Spatial-Aware Correlated Multiple Instance Learning (SAC-MIL) for performing WSI classification. SAC-MIL consists of a positional encoding module to encode position information and a SAC block to perform full instance correlations. The positional encoding module utilizes the instance coordinates within the slide to encode the spatial relationships instead of the instance index in the input WSI sequence. The positional encoding module can also handle the length extrapolation issue where the training and testing sequences have different lengths. The SAC block is an MLP-based method that performs full instance correlation in linear time complexity with respect to the sequence length. Due to the simple structure of MLP, it is easy to deploy since it does not require custom CUDA kernels, compared to Transformer-based methods for WSI classification. SAC-MIL has achieved state-of-the-art performance on the CAMELYON-16, TCGA-LUNG, and TCGA-BRAC datasets. The code will be released upon acceptance.