🤖 AI Summary
In weakly supervised classification of whole-slide images (WSIs), poor generalization arises from the absence of fine-grained annotations, inadequate modeling of spatial structure, and confounding factors such as color variation. To address these challenges, we propose GMIL-IT—a novel framework integrating graph-based multiple instance learning (Graph-MIL) with causal intervention training. Our work is the first to systematically demonstrate that explicit graph-structured modeling alone significantly enhances cross-domain generalization, challenging the prevailing assumption that intervention training is strictly necessary. GMIL-IT jointly optimizes feature disentanglement and spatial relational modeling. Leveraging diverse graph construction strategies—including k-nearest neighbors, superpixels, and spatial distance—we validate our approach across multiple WSI benchmarks. Results show Graph-MIL improves generalization over conventional MIL by 12.7%; GMIL-IT further mitigates confounding bias, yielding an 8.3% AUC gain under domain shift.
📝 Abstract
Whole Slide Imaging (WSI), which involves high-resolution digital scans of pathology slides, has become the gold standard for cancer diagnosis, but its gigapixel resolution and the scarcity of annotated datasets present challenges for deep learning models. Multiple Instance Learning (MIL), a widely-used weakly supervised approach, bypasses the need for patch-level annotations. However, conventional MIL methods overlook the spatial relationships between patches, which are crucial for tasks such as cancer grading and diagnosis. To address this, graph-based approaches have gained prominence by incorporating spatial information through node connections. Despite their potential, both MIL and graph-based models are vulnerable to learning spurious associations, like color variations in WSIs, affecting their robustness. In this dissertation, we conduct an extensive comparison of multiple graph construction techniques, MIL models, graph-MIL approaches, and interventional training, introducing a new framework, Graph-based Multiple Instance Learning with Interventional Training (GMIL-IT), for WSI classification. We evaluate their impact on model generalization through domain shift analysis and demonstrate that graph-based models alone achieve the generalization initially anticipated from interventional training. Our code is available here: github.com/ritamartinspereira/GMIL-IT