Do Sparse Subnetworks Exhibit Cognitively Aligned Attention? Effects of Pruning on Saliency Map Fidelity, Sparsity, and Concept Coherence

📅 2025-09-23

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

This study systematically investigates the dual impact of model pruning on neural network interpretability—quantifying fidelity, sparsity, and semantic consistency across two interpretability dimensions: low-level saliency maps and high-level concept representations. Using magnitude-based pruning and fine-tuning on ResNet-18 trained on ImageNette, we analyze saliency via Vanilla Gradients and Integrated Gradients, and extract human-aligned concepts using CRAFT. Results show that light-to-moderate pruning (≤40% sparsity) improves saliency map focus, enhances concept disentanglement, and strengthens alignment with human cognition. In contrast, aggressive pruning—even when preserving predictive accuracy—induces feature entanglement, saliency distortion, and semantic degradation of learned concepts. Crucially, this work provides the first empirical evidence of “performance–interpretability decoupling”: model accuracy and interpretability do not co-vary monotonically under pruning. These findings establish critical theoretical foundations and practical design boundaries for interpretable model compression in trustworthy AI.

Technology Category

Application Category

📝 Abstract

Prior works have shown that neural networks can be heavily pruned while preserving performance, but the impact of pruning on model interpretability remains unclear. In this work, we investigate how magnitude-based pruning followed by fine-tuning affects both low-level saliency maps and high-level concept representations. Using a ResNet-18 trained on ImageNette, we compare post-hoc explanations from Vanilla Gradients (VG) and Integrated Gradients (IG) across pruning levels, evaluating sparsity and faithfulness. We further apply CRAFT-based concept extraction to track changes in semantic coherence of learned concepts. Our results show that light-to-moderate pruning improves saliency-map focus and faithfulness while retaining distinct, semantically meaningful concepts. In contrast, aggressive pruning merges heterogeneous features, reducing saliency map sparsity and concept coherence despite maintaining accuracy. These findings suggest that while pruning can shape internal representations toward more human-aligned attention patterns, excessive pruning undermines interpretability.

Problem

Research questions and friction points this paper is trying to address.

How pruning affects neural network interpretability and attention patterns

Impact of pruning on saliency map faithfulness and concept coherence

Whether sparse subnetworks maintain cognitively aligned feature representations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Light pruning improves saliency map focus and faithfulness

Moderate pruning retains distinct semantically meaningful concepts

Aggressive pruning merges heterogeneous features reducing interpretability

🔎 Similar Papers

Brain Mapping with Dense Features: Grounding Cortical Semantic Selectivity in Natural Images With Vision Transformers