CAPRMIL: Context-Aware Patch Representations for Multiple Instance Learning

📅 2025-12-16

📈 Citations: 0

✨ Influential: 0

career value

150K/year

🤖 AI Summary

To address the reliance on computationally expensive attention-based aggregators in weakly supervised learning for whole-slide images (WSIs) in computational pathology, this paper proposes a context-aware lightweight multiple-instance learning (MIL) framework. Methodologically, it freezes the ViT patch encoder, introduces a learnable global morphological-aware token, and—novelty—the first integration of neural partial differential equation (PDE) solving principles into MIL to explicitly decouple correlation modeling from aggregation. Global contextual enhancement is achieved via linear-complexity token injection, eliminating costly attention aggregators. Evaluated on multiple public pathological datasets, the method achieves state-of-the-art slide-level classification performance while reducing model parameters by 48%–92.8%, inference FLOPs by 52%–99%, and significantly lowering GPU memory consumption and training time compared to existing approaches.

Technology Category

Application Category

📝 Abstract

In computational pathology, weak supervision has become the standard for deep learning due to the gigapixel scale of WSIs and the scarcity of pixel-level annotations, with Multiple Instance Learning (MIL) established as the principal framework for slide-level model training. In this paper, we introduce a novel setting for MIL methods, inspired by proceedings in Neural Partial Differential Equation (PDE) Solvers. Instead of relying on complex attention-based aggregation, we propose an efficient, aggregator-agnostic framework that removes the complexity of correlation learning from the MIL aggregator. CAPRMIL produces rich context-aware patch embeddings that promote effective correlation learning on downstream tasks. By projecting patch features -- extracted using a frozen patch encoder -- into a small set of global context/morphology-aware tokens and utilizing multi-head self-attention, CAPRMIL injects global context with linear computational complexity with respect to the bag size. Paired with a simple Mean MIL aggregator, CAPRMIL matches state-of-the-art slide-level performance across multiple public pathology benchmarks, while reducing the total number of trainable parameters by 48%-92.8% versus SOTA MILs, lowering FLOPs during inference by 52%-99%, and ranking among the best models on GPU memory efficiency and training time. Our results indicate that learning rich, context-aware instance representations before aggregation is an effective and scalable alternative to complex pooling for whole-slide analysis. Our code is available at https://github.com/mandlos/CAPRMIL

Problem

Research questions and friction points this paper is trying to address.

Develops context-aware patch embeddings for effective correlation learning in pathology.

Proposes an efficient, aggregator-agnostic MIL framework to reduce computational complexity.

Enhances slide-level performance while lowering trainable parameters and FLOPs significantly.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates context-aware patch embeddings via global tokens

Uses multi-head self-attention with linear computational complexity

Reduces parameters and FLOPs significantly with simple Mean aggregator

🔎 Similar Papers

No similar papers found.