Enhancing Representations Through Heterogeneous Self-Supervised Learning

📅 2023-10-08

🏛️ IEEE Transactions on Pattern Analysis and Machine Intelligence

📈 Citations: 3

✨ Influential: 0

career value

157K/year

🤖 AI Summary

Self-supervised learning (SSL) representations are often constrained by the inductive bias of a single backbone architecture. Method: We propose Heterogeneous Self-Supervised Learning (HSSL), a framework that introduces lightweight, structurally heterogeneous auxiliary heads—e.g., Transformer- and CNN-based—alongside the fixed backbone, enabling collaborative representation optimization without modifying the main network. Contribution/Results: We systematically demonstrate, for the first time, a positive correlation between architectural heterogeneity and representation quality. Leveraging this insight, we design a disparity-driven representation distillation mechanism and an efficient auxiliary-head search strategy. Extensive experiments across image classification, semantic/instance segmentation, and object detection show that HSSL consistently outperforms leading SSL methods—including MoCo and SimCLR—while maintaining compatibility with diverse contrastive learning baselines. This validates both the effectiveness and generalizability of architectural complementarity in self-supervised representation learning.

📝 Abstract

Incorporating heterogeneous representations from different architectures has facilitated various vision tasks, e.g., some hybrid networks combine transformers and convolutions. However, complementarity between such heterogeneous architectures has not been well exploited in self-supervised learning. Thus, we propose Heterogeneous Self-Supervised Learning (HSSL), which enforces a base model to learn from an auxiliary head whose architecture is heterogeneous from the base model. In this process, HSSL endows the base model with new characteristics in a representation learning way without structural changes. To comprehensively understand the HSSL, we conduct experiments on various heterogeneous pairs containing a base model and an auxiliary head. We discover that the representation quality of the base model moves up as their architecture discrepancy grows. This observation motivates us to propose a search strategy that quickly determines the most suitable auxiliary head for a specific base model to learn and several simple but effective methods to enlarge the model discrepancy. The HSSL is compatible with various self-supervised methods, achieving superior performances on various downstream tasks, including image classification, semantic segmentation, instance segmentation, and object detection.

Problem

Research questions and friction points this paper is trying to address.

Exploiting complementarity between heterogeneous architectures in self-supervised learning

Enhancing base model representations without structural changes via auxiliary heads

Improving vision tasks through architecture discrepancy and search strategies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Heterogeneous Self-Supervised Learning combines diverse architectures

Base model learns from auxiliary head with different architecture

Search strategy selects optimal auxiliary head for base model

🔎 Similar Papers

No similar papers found.