On the Out-of-Distribution Generalization of Self-Supervised Learning

📅 2025-05-22

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Self-supervised learning (SSL) often suffers from degraded out-of-distribution (OOD) generalization due to reliance on spurious correlations in data. This paper pioneers the integration of causal inference into SSL-based OOD generalization analysis, proposing the Post-Intervention Distribution (PID) framework grounded in structural causal models. We theoretically establish that SSL achieves the optimal worst-case OOD generalization bound under PID. To operationalize this insight, we design a causal-aware batch sampling strategy that explicitly breaks spurious correlations while satisfying PID constraints. Our approach is agnostic to SSL objectives and is validated across prominent paradigms—including SimCLR and Barlow Twins. Extensive experiments on standard OOD benchmarks (e.g., ImageNet-C, ObjectNet, DomainBed) demonstrate significant improvements in robustness, consistently outperforming state-of-the-art SSL methods.

Technology Category

Application Category

📝 Abstract

In this paper, we focus on the out-of-distribution (OOD) generalization of self-supervised learning (SSL). By analyzing the mini-batch construction during the SSL training phase, we first give one plausible explanation for SSL having OOD generalization. Then, from the perspective of data generation and causal inference, we analyze and conclude that SSL learns spurious correlations during the training process, which leads to a reduction in OOD generalization. To address this issue, we propose a post-intervention distribution (PID) grounded in the Structural Causal Model. PID offers a scenario where the spurious variable and label variable is mutually independent. Besides, we demonstrate that if each mini-batch during SSL training satisfies PID, the resulting SSL model can achieve optimal worst-case OOD performance. This motivates us to develop a batch sampling strategy that enforces PID constraints through the learning of a latent variable model. Through theoretical analysis, we demonstrate the identifiability of the latent variable model and validate the effectiveness of the proposed sampling strategy. Experiments conducted on various downstream OOD tasks demonstrate the effectiveness of the proposed sampling strategy.

Problem

Research questions and friction points this paper is trying to address.

Analyzing OOD generalization in self-supervised learning

Identifying spurious correlations reducing OOD performance

Proposing PID-based batch sampling for optimal OOD results

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes post-intervention distribution for SSL

Develops PID-constrained batch sampling strategy

Uses latent variable model for OOD generalization

🔎 Similar Papers

A Review on Discriminative Self-supervised Learning Methods in Computer Vision