Self-Organizing Visual Prototypes for Non-Parametric Representation Learning

📅 2025-05-23

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

Existing prototype-based self-supervised learning relies on a single prototype to represent all features within a cluster, failing to capture semantic diversity in data space. This work proposes Self-Organizing Prototypes (SOP), which abandons fixed prototypes and instead dynamically organizes multiple semantically similar support embeddings (SEs) to collaboratively model local feature structures. Methodologically: (i) it introduces the first multi-prototype collaborative representation mechanism; (ii) it designs a non-parametric SOP-MIM masked modeling task; and (iii) it integrates non-parametric contrastive learning, reconstruction loss, and dynamic SE organization for fully parameter-free feature-space modeling. SOP achieves state-of-the-art performance across diverse downstream tasks—including image retrieval, linear evaluation, fine-tuning, and object detection—with particularly pronounced gains when adapted to large-scale models.

Technology Category

Application Category

📝 Abstract

We present Self-Organizing Visual Prototypes (SOP), a new training technique for unsupervised visual feature learning. Unlike existing prototypical self-supervised learning (SSL) methods that rely on a single prototype to encode all relevant features of a hidden cluster in the data, we propose the SOP strategy. In this strategy, a prototype is represented by many semantically similar representations, or support embeddings (SEs), each containing a complementary set of features that together better characterize their region in space and maximize training performance. We reaffirm the feasibility of non-parametric SSL by introducing novel non-parametric adaptations of two loss functions that implement the SOP strategy. Notably, we introduce the SOP Masked Image Modeling (SOP-MIM) task, where masked representations are reconstructed from the perspective of multiple non-parametric local SEs. We comprehensively evaluate the representations learned using the SOP strategy on a range of benchmarks, including retrieval, linear evaluation, fine-tuning, and object detection. Our pre-trained encoders achieve state-of-the-art performance on many retrieval benchmarks and demonstrate increasing performance gains with more complex encoders.

Problem

Research questions and friction points this paper is trying to address.

Develops self-organizing visual prototypes for unsupervised learning

Enhances feature representation with multiple complementary embeddings

Introduces non-parametric adaptations for improved SSL performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multiple support embeddings represent prototypes

Non-parametric adaptations of loss functions

SOP Masked Image Modeling task introduced

🔎 Similar Papers

No similar papers found.

Bosch Group

Attraktive Vergütung

Horb am Neckar, BW, DE

Master Thesis AI-Based Keypoint Refinement for Autonomous Driving

Bosch Group

Hildesheim, NDS, DE

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)