๐ค AI Summary
In weakly supervised classification of whole-slide images (WSIs), the extreme image resolution and scarcity of fine-grained annotations lead to instance ambiguity and bag-level semantic inconsistency. To address these challenges, we propose a dual-stream attention-guided framework. Our method introduces a novel multi-scale attention-based pseudo-label generation mechanism, employs a lightweight shared VSSMamba encoder to model long-range dependencies, and incorporates a Fusion Attention and Semantic Alignment (FASA) module for cross-stream feature co-optimization. Furthermore, we design a dual-stream mutual consistency hybrid loss and establish an end-to-end teacherโstudent collaborative training paradigm. Evaluated on CIFAR-10, NCT-CRC, and TCGA-Lung datasets, our approach consistently outperforms state-of-the-art multiple-instance learning (MIL) models, achieving superior classification accuracy and enhanced robustness under weak supervision.
๐ Abstract
Whole-slide images (WSIs) are critical for cancer diagnosis due to their ultra-high resolution and rich semantic content. However, their massive size and the limited availability of fine-grained annotations pose substantial challenges for conventional supervised learning. We propose DSAGL (Dual-Stream Attention-Guided Learning), a novel weakly supervised classification framework that combines a teacher-student architecture with a dual-stream design. DSAGL explicitly addresses instance-level ambiguity and bag-level semantic consistency by generating multi-scale attention-based pseudo labels and guiding instance-level learning. A shared lightweight encoder (VSSMamba) enables efficient long-range dependency modeling, while a fusion-attentive module (FASA) enhances focus on sparse but diagnostically relevant regions. We further introduce a hybrid loss to enforce mutual consistency between the two streams. Experiments on CIFAR-10, NCT-CRC, and TCGA-Lung datasets demonstrate that DSAGL consistently outperforms state-of-the-art MIL baselines, achieving superior discriminative performance and robustness under weak supervision.