On Measuring Localization of Shortcuts in Deep Networks

📅 2025-10-30

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

The layer-wise distribution mechanism of shortcuts (spurious correlations) in deep networks remains poorly understood, hindering principled mitigation strategies. To address this, we propose counterfactual inter-layer attribution to quantify each layer’s contribution to generalization degradation under clean versus biased data, conducting systematic analysis across VGG, ResNet, DeiT, and ConvNeXt on CIFAR-10, Waterbirds, and CelebA. We discover a cross-layer collaborative shortcut learning pattern: shallow layers predominantly encode spurious features, while deeper layers selectively forget core discriminative features from clean data. Leveraging this insight, we construct multi-dimensional perturbation axes for precise shortcut localization. Experiments reveal that shortcut effects permeate the entire network and exhibit strong dependence on both dataset and architecture, rendering generic mitigation strategies ineffective. Customized, architecture- and task-aware interventions are thus essential. This work establishes a novel paradigm for mechanistic modeling of shortcuts and targeted intervention.

Technology Category

Application Category

📝 Abstract

Shortcuts, spurious rules that perform well during training but fail to generalize, present a major challenge to the reliability of deep networks (Geirhos et al., 2020). However, the impact of shortcuts on feature representations remains understudied, obstructing the design of principled shortcut-mitigation methods. To overcome this limitation, we investigate the layer-wise localization of shortcuts in deep models. Our novel experiment design quantifies the layer-wise contribution to accuracy degradation caused by a shortcut-inducing skew by counterfactual training on clean and skewed datasets. We employ our design to study shortcuts on CIFAR-10, Waterbirds, and CelebA datasets across VGG, ResNet, DeiT, and ConvNeXt architectures. We find that shortcut learning is not localized in specific layers but distributed throughout the network. Different network parts play different roles in this process: shallow layers predominantly encode spurious features, while deeper layers predominantly forget core features that are predictive on clean data. We also analyze the differences in localization and describe its principal axes of variation. Finally, our analysis of layer-wise shortcut-mitigation strategies suggests the hardness of designing general methods, supporting dataset- and architecture-specific approaches instead.

Problem

Research questions and friction points this paper is trying to address.

Investigates layer-wise localization of shortcut learning in deep networks

Quantifies how shortcuts degrade accuracy across different network layers

Analyzes differences in shortcut mitigation strategies across architectures

Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantifies layer-wise accuracy degradation via counterfactual training

Analyzes shortcut localization across multiple datasets and architectures

Identifies distributed shortcut learning with distinct layer roles

🔎 Similar Papers

No similar papers found.