Shortcut Invariance: Targeted Jacobian Regularization in Disentangled Latent Space

📅 2025-11-24

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Deep neural networks often exploit spurious correlations (i.e., shortcut features) in training data, leading to severe degradation in out-of-distribution (OOD) generalization. To address this, we propose Directional Jacobian Regularization in a disentangled latent space: first identifying latent shortcut directions via label correlation analysis, then injecting anisotropic noise along these directions in the latent space to explicitly enforce functional invariance of the classifier with respect to shortcut features. Unlike methods relying on complex representation learning, our approach achieves fine-grained control over classifier sensitivity solely through gradient-level, direction-aware regularization. Evaluated on multiple standard shortcut-learning benchmarks, our method consistently outperforms existing approaches, achieving state-of-the-art OOD generalization performance—demonstrating both effectiveness and robustness across diverse distribution shifts.

Technology Category

Application Category

📝 Abstract

Deep neural networks are prone to learning shortcuts, spurious and easily learned correlations in training data that cause severe failures in out-of-distribution (OOD) generalization. A dominant line of work seeks robustness by learning a robust representation, often explicitly partitioning the latent space into core and spurious components; this approach can be complex, brittle, and difficult to scale. We take a different approach, instead of a robust representation, we learn a robust function. We present a simple and effective training method that renders the classifier functionally invariant to shortcut signals. Our method operates within a disentangled latent space, which is essential as it isolates spurious and core features into distinct dimensions. This separation enables the identification of candidate shortcut features by their strong correlation with the label, used as a proxy for semantic simplicity. The classifier is then desensitized to these features by injecting targeted, anisotropic latent noise during training. We analyze this as targeted Jacobian regularization, which forces the classifier to ignore spurious features and rely on more complex, core semantic signals. The result is state-of-the-art OOD performance on established shortcut learning benchmarks.

Problem

Research questions and friction points this paper is trying to address.

Prevents neural networks from learning spurious data correlations

Enhances out-of-distribution generalization through robust functions

Achieves invariance to shortcut features via targeted regularization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Targeted Jacobian regularization for shortcut invariance

Disentangled latent space isolates spurious features

Anisotropic latent noise injection during training

🔎 Similar Papers

Explicitly Disentangled Representations in Object-Centric Learning