PixelCAM: Pixel Class Activation Mapping for Histology Image Classification and ROI Localization

📅 2025-03-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address critical challenges in weakly supervised object localization (WSOL) for histopathological images—including large activation bias in single-step Class Activation Mapping (CAM), limited localization capability in two-stage methods due to frozen classifiers, asynchronous convergence between classification and localization tasks, and poor out-of-distribution (OOD) generalization—this work proposes the first pixel-level multi-task WSOL framework tailored for pathology images. It jointly optimizes image classification and foreground/background pixel segmentation within a shared encoder’s pixel feature space. A lightweight PixelCAM module enables end-to-end training and seamlessly integrates with both CNN and Transformer backbones without architectural modification. Additionally, a pixel-level pseudo-label distillation mechanism is introduced to enhance localization robustness. Evaluated across multiple in-distribution and OOD histology datasets, the method achieves state-of-the-art performance, significantly mitigating over-activation, under-activation, and task asynchrony.

Technology Category

Application Category

📝 Abstract
Weakly supervised object localization (WSOL) methods allow training models to classify images and localize ROIs. WSOL only requires low-cost image-class annotations yet provides a visually interpretable classifier, which is important in histology image analysis. Standard WSOL methods rely on class activation mapping (CAM) methods to produce spatial localization maps according to a single- or two-step strategy. While both strategies have made significant progress, they still face several limitations with histology images. Single-step methods can easily result in under- or over-activation due to the limited visual ROI saliency in histology images and the limited localization cues. They also face the well-known issue of asynchronous convergence between classification and localization tasks. The two-step approach is sub-optimal because it is tied to a frozen classifier, limiting the capacity for localization. Moreover, these methods also struggle when applied to out-of-distribution (OOD) datasets. In this paper, a multi-task approach for WSOL is introduced for simultaneous training of both tasks to address the asynchronous convergence problem. In particular, localization is performed in the pixel-feature space of an image encoder that is shared with classification. This allows learning discriminant features and accurate delineation of foreground/background regions to support ROI localization and image classification. We propose PixelCAM, a cost-effective foreground/background pixel-wise classifier in the pixel-feature space that allows for spatial object localization. PixelCAM is trained using pixel pseudo-labels collected from a pretrained WSOL model. Both image and pixel-wise classifiers are trained simultaneously using standard gradient descent. In addition, our pixel classifier can easily be integrated into CNN- and transformer-based architectures without any modifications.
Problem

Research questions and friction points this paper is trying to address.

Addresses under- or over-activation in histology WSOL methods
Solves asynchronous convergence in classification and localization tasks
Improves localization accuracy for out-of-distribution histology datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-task WSOL for simultaneous classification and localization
PixelCAM for pixel-wise foreground/background classification
Shared encoder for feature learning and ROI delineation
🔎 Similar Papers
No similar papers found.