🤖 AI Summary
Self-supervised learning (SSL) often suffers from degraded out-of-distribution (OOD) generalization due to reliance on spurious correlations in data. This paper pioneers the integration of causal inference into SSL-based OOD generalization analysis, proposing the Post-Intervention Distribution (PID) framework grounded in structural causal models. We theoretically establish that SSL achieves the optimal worst-case OOD generalization bound under PID. To operationalize this insight, we design a causal-aware batch sampling strategy that explicitly breaks spurious correlations while satisfying PID constraints. Our approach is agnostic to SSL objectives and is validated across prominent paradigms—including SimCLR and Barlow Twins. Extensive experiments on standard OOD benchmarks (e.g., ImageNet-C, ObjectNet, DomainBed) demonstrate significant improvements in robustness, consistently outperforming state-of-the-art SSL methods.
📝 Abstract
In this paper, we focus on the out-of-distribution (OOD) generalization of self-supervised learning (SSL). By analyzing the mini-batch construction during the SSL training phase, we first give one plausible explanation for SSL having OOD generalization. Then, from the perspective of data generation and causal inference, we analyze and conclude that SSL learns spurious correlations during the training process, which leads to a reduction in OOD generalization. To address this issue, we propose a post-intervention distribution (PID) grounded in the Structural Causal Model. PID offers a scenario where the spurious variable and label variable is mutually independent. Besides, we demonstrate that if each mini-batch during SSL training satisfies PID, the resulting SSL model can achieve optimal worst-case OOD performance. This motivates us to develop a batch sampling strategy that enforces PID constraints through the learning of a latent variable model. Through theoretical analysis, we demonstrate the identifiability of the latent variable model and validate the effectiveness of the proposed sampling strategy. Experiments conducted on various downstream OOD tasks demonstrate the effectiveness of the proposed sampling strategy.