Seeing the Whole in the Parts in Self-Supervised Representation Learning

📅 2025-01-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Self-supervised image representation learning suffers from insufficient robustness against noise, adversarial perturbations, and severe cropping. Method: This paper proposes CO-SSL—a novel framework that, for the first time, establishes local-global feature alignment as the core mechanism of instance-discriminative self-supervised learning (SSL), explicitly modeling the spatial co-occurrence between local regions and global semantics *before* pooling. CO-SSL abandons masking and aggressive cropping, relying solely on lightweight data augmentations to uncover the intrinsic role of highly redundant local representations in enhancing robustness. Results: Trained for 100 epochs on ImageNet-1K, CO-SSL achieves 71.5% top-1 accuracy—surpassing prior SSL methods. It demonstrates exceptional robustness under image noise, internal perturbations, small-scale adversarial attacks, and large-area cropping, empirically validating the effectiveness and generalizability of local-global alignment for robust representation learning.

Technology Category

Application Category

📝 Abstract
Recent successes in self-supervised learning (SSL) model spatial co-occurrences of visual features either by masking portions of an image or by aggressively cropping it. Here, we propose a new way to model spatial co-occurrences by aligning local representations (before pooling) with a global image representation. We present CO-SSL, a family of instance discrimination methods and show that it outperforms previous methods on several datasets, including ImageNet-1K where it achieves 71.5% of Top-1 accuracy with 100 pre-training epochs. CO-SSL is also more robust to noise corruption, internal corruption, small adversarial attacks, and large training crop sizes. Our analysis further indicates that CO-SSL learns highly redundant local representations, which offers an explanation for its robustness. Overall, our work suggests that aligning local and global representations may be a powerful principle of unsupervised category learning.
Problem

Research questions and friction points this paper is trying to address.

Unsupervised Learning
Image Recognition
Robustness
Innovation

Methods, ideas, or system contributions that make the work stand out.

CO-SSL
Self-supervised Learning
Feature Matching
🔎 Similar Papers
No similar papers found.