🤖 AI Summary
Deep neural networks often suffer from degraded out-of-distribution (OOD) generalization due to reliance on spurious correlations between non-causal features and labels. Existing debiasing methods require manual annotation of such spurious correlations, limiting practical applicability. This paper proposes the first fully self-guided post-hoc debiasing framework—requiring no external supervision—by quantifying neuron-level sensitivity via latent-space probing, identifying and dynamically modulating critical neurons responsible for spurious predictions through gradient-driven analysis, and enforcing theoretically grounded debiasing regularization. Our method directly intervenes in the decision-making process at the neuron level, ensuring both interpretability and theoretical rigor. Experiments across ResNet and Vision Transformer (ViT) architectures, and on multimodal image and text tasks, demonstrate substantial improvements in OOD robustness: average spurious correlation accuracy drops by 32.7%, while discriminative capability for genuine features concurrently increases.
📝 Abstract
Deep neural networks often develop spurious bias, reliance on correlations between non-essential features and classes for predictions. For example, a model may identify objects based on frequently co-occurring backgrounds rather than intrinsic features, resulting in degraded performance on data lacking these correlations. Existing mitigation approaches typically depend on external annotations of spurious correlations, which may be difficult to obtain and are not relevant to the spurious bias in a model. In this paper, we take a step towards self-guided mitigation of spurious bias by proposing NeuronTune, a post hoc method that directly intervenes in a model's internal decision process. Our method probes in a model's latent embedding space to identify and regulate neurons that lead to spurious prediction behaviors. We theoretically justify our approach and show that it brings the model closer to an unbiased one. Unlike previous methods, NeuronTune operates without requiring spurious correlation annotations, making it a practical and effective tool for improving model robustness. Experiments across different architectures and data modalities demonstrate that our method significantly mitigates spurious bias in a self-guided way.