🤖 AI Summary
Existing saliency methods suffer from a pervasive class-insensitivity flaw in multi-class settings: explanations for a given input remain nearly invariant across different target classes, severely undermining their reliability and causal credibility. This work is the first to systematically expose this structural limitation and proposes the Contrastive Activation-based Saliency Explanation (CASE) framework. CASE explicitly models inter-class activation differences to isolate discriminative features unique to the predicted class. It integrates contrastive learning with gradient-weighted aggregation, enabling end-to-end optimization and perturbation-robustness evaluation. On benchmarks including ImageNet and CIFAR, CASE achieves substantial improvements—+42% in class-sensitivity (diagnostic score) and +28% in explanation fidelity (deletion/insertion AUC)—outperforming leading methods such as Grad-CAM and Integrated Gradients across all metrics.
📝 Abstract
Saliency methods are widely used to visualize which input features are deemed relevant to a model's prediction. However, their visual plausibility can obscure critical limitations. In this work, we propose a diagnostic test for class sensitivity: a method's ability to distinguish between competing class labels on the same input. Through extensive experiments, we show that many widely used saliency methods produce nearly identical explanations regardless of the class label, calling into question their reliability. We find that class-insensitive behavior persists across architectures and datasets, suggesting the failure mode is structural rather than model-specific. Motivated by these findings, we introduce CASE, a contrastive explanation method that isolates features uniquely discriminative for the predicted class. We evaluate CASE using the proposed diagnostic and a perturbation-based fidelity test, and show that it produces faithful and more class-specific explanations than existing methods.