Learning domain-invariant features through channel-level sparsification for Out-Of Distribution Generalization

📅 2026-03-26

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

This work addresses the vulnerability of deep models to spurious, domain-specific features under out-of-distribution (OOD) scenarios, which undermines generalization. To mitigate this, the authors propose a channel-wise causal masking strategy combined with a hierarchical causal Dropout mechanism to explicitly disentangle causal and non-causal features at the representation level. Furthermore, they introduce a matrix-based mutual information optimization objective and enhance the VICReg regularizer with StyleMix augmentation to effectively decouple semantic content from stylistic variations. The resulting framework achieves substantial improvements over existing methods across multiple OOD benchmarks, demonstrating superior and more stable cross-domain generalization performance.

Technology Category

Application Category

📝 Abstract

Out-of-Distribution (OOD) generalization has become a primary metric for evaluating image analysis systems. Since deep learning models tend to capture domain-specific context, they often develop shortcut dependencies on these non-causal features, leading to inconsistent performance across different data sources. Current techniques, such as invariance learning, attempt to mitigate this. However, they struggle to isolate highly mixed features within deep latent spaces. This limitation prevents them from fully resolving the shortcut learning problem.In this paper, we propose Hierarchical Causal Dropout (HCD), a method that uses channel-level causal masks to enforce feature sparsity. This approach allows the model to separate causal features from spurious ones, effectively performing a causal intervention at the representation level. The training is guided by a Matrix-based Mutual Information (MMI) objective to minimize the mutual information between latent features and domain labels, while simultaneously maximizing the information shared with class labels.To ensure stability, we incorporate a StyleMix-driven VICReg module, which prevents the masks from accidentally filtering out essential causal data. Experimental results on OOD benchmarks show that HCD performs better than existing top-tier methods.

Problem

Research questions and friction points this paper is trying to address.

Out-of-Distribution Generalization

Domain Invariance

Shortcut Learning

Causal Features

Feature Sparsity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Causal Dropout

channel-level sparsification

causal intervention