Keep your distance: learning dispersed embeddings on $mathbb{S}_d$

📅 2025-02-12

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

To address the challenge of jointly optimizing task loss and feature dispersion in high-dimensional embedding spaces, this work focuses on learning dispersed representations on the unit hypersphere to enhance discriminability for text and image embeddings. Methodologically, we propose three innovations: (1) the first formulation of pairwise dispersion as a Maximum Mean Discrepancy (MMD) optimization problem; (2) a hypersphere-adapted variant of the online Lloyd algorithm, enabling streaming embedding clustering and dispersion control; and (3) an explicit dispersion regularization term grounded in hyperspherical geometric constraints, seamlessly integrable into mainstream frameworks such as contrastive learning. Experiments demonstrate significant improvements in generalization across image classification and NLP tasks. Furthermore, our methods exhibit complementary strengths: certain variants excel in low-dimensional or small-batch regimes, while others perform better in high-dimensional or large-batch settings.

Technology Category

Application Category

📝 Abstract

Learning well-separated features in high-dimensional spaces, such as text or image embeddings, is crucial for many machine learning applications. Achieving such separation can be effectively accomplished through the dispersion of embeddings, where unrelated vectors are pushed apart as much as possible. By constraining features to be on a hypersphere, we can connect dispersion to well-studied problems in mathematics and physics, where optimal solutions are known for limited low-dimensional cases. However, in representation learning we typically deal with a large number of features in high-dimensional space, and moreover, dispersion is usually traded off with some other task-oriented training objective, making existing theoretical and numerical solutions inapplicable. Therefore, it is common to rely on gradient-based methods to encourage dispersion, usually by minimizing some function of the pairwise distances. In this work, we first give an overview of existing methods from disconnected literature, making new connections and highlighting similarities. Next, we introduce some new angles. We propose to reinterpret pairwise dispersion using a maximum mean discrepancy (MMD) motivation. We then propose an online variant of the celebrated Lloyd's algorithm, of K-Means fame, as an effective alternative regularizer for dispersion on generic domains. Finally, we derive a novel dispersion method that directly exploits properties of the hypersphere. Our experiments show the importance of dispersion in image classification and natural language processing tasks, and how algorithms exhibit different trade-offs in different regimes.

Problem

Research questions and friction points this paper is trying to address.

Learning dispersed high-dimensional embeddings

Connecting dispersion to hypersphere properties

Proposing new dispersion methods for machine learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dispersion of embeddings on hypersphere

Online variant of Lloyd's algorithm

Maximum mean discrepancy for pairwise dispersion

🔎 Similar Papers

No similar papers found.

Bosch Group

Renningen, BW, DE

Software Engineer, Discover Ads Retrieval

Google

$207,000-$300,000 + bonus + equity + benefits

Mountain View, CA, USA

Machine Learning Engineer