Enhancing VICReg: Random-Walk Pairing for Improved Generalization and Better Global Semantics Capturing

📅 2025-06-22

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

Self-supervised methods like VICReg tend to overfit the training data distribution, limiting generalization to unseen samples and hindering learning of globally semantically consistent representations. To address this, we propose SAG-VICReg: it extends VICReg with a random-walk-based pairing strategy to strengthen long-range semantic association modeling across samples; and introduces an unsupervised, spectral-embedding-based global structural evaluation metric that transcends conventional local similarity assessment. Experiments demonstrate that SAG-VICReg achieves state-of-the-art or near-state-of-the-art performance on major self-supervised benchmarks—including ImageNet-1K—while significantly outperforming baselines in global semantic consistency. Crucially, it preserves strong local feature discriminability. The proposed framework thus bridges the gap between local contrastive learning and global structural coherence, enabling more robust and generalizable representation learning.

Technology Category

Application Category

📝 Abstract

In this paper, we argue that viewing VICReg-a popular self-supervised learning (SSL) method--through the lens of spectral embedding reveals a potential source of sub-optimality: it may struggle to generalize robustly to unseen data due to overreliance on the training data. This observation invites a closer look at how well this method achieves its goal of producing meaningful representations of images outside of the training set as well. Here, we investigate this issue and introduce SAG-VICReg (Stable and Generalizable VICReg), a method that builds on VICReg by incorporating new training techniques. These enhancements improve the model's ability to capture global semantics within the data and strengthen the generalization capabilities. Experiments demonstrate that SAG-VICReg effectively addresses the generalization challenge while matching or surpassing diverse state-of-the-art SSL baselines. Notably, our method exhibits superior performance on metrics designed to evaluate global semantic understanding, while simultaneously maintaining competitive results on local evaluation metrics. Furthermore, we propose a new standalone evaluation metric for embeddings that complements the standard evaluation methods and accounts for the global data structure without requiring labels--a key issue when tagged data is scarce or not available.

Problem

Research questions and friction points this paper is trying to address.

Improving VICReg generalization to unseen data

Enhancing global semantics capture in SSL

Proposing label-free embedding evaluation metric

Innovation

Methods, ideas, or system contributions that make the work stand out.

Enhances VICReg with random-walk pairing

Improves global semantics capturing

Introduces label-free embedding evaluation metric

🔎 Similar Papers

ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling