Beyond I-Con: Exploring New Dimension of Distance Measures in Representation Learning

📅 2025-09-04

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This paper identifies inherent limitations of the Kullback–Leibler (KL) divergence in representation learning—including asymmetry, unboundedness, and misalignment with downstream objectives. To address these issues, we propose the Beyond I-Con framework, the first systematic approach to co-designing statistical divergences (e.g., total variation distance, bounded f-divergences) and similarity kernel functions (e.g., distance-based kernels) for loss construction. The framework decouples divergence selection from kernel design, enabling task-adaptive representation learning. Experiments on DINO-ViT clustering, supervised contrastive learning, and nonlinear dimensionality reduction demonstrate that our method consistently outperforms KL divergence–based baselines paired with angular kernels. Notably, improvements extend to downstream classification and retrieval performance, validating both the effectiveness and generalizability of divergence–kernel co-design.

Technology Category

Application Category

📝 Abstract

The Information Contrastive (I-Con) framework revealed that over 23 representation learning methods implicitly minimize KL divergence between data and learned distributions that encode similarities between data points. However, a KL-based loss may be misaligned with the true objective, and properties of KL divergence such as asymmetry and unboundedness may create optimization challenges. We present Beyond I-Con, a framework that enables systematic discovery of novel loss functions by exploring alternative statistical divergences and similarity kernels. Key findings: (1) on unsupervised clustering of DINO-ViT embeddings, we achieve state-of-the-art results by modifying the PMI algorithm to use total variation (TV) distance; (2) on supervised contrastive learning, we outperform the standard approach by using TV and a distance-based similarity kernel instead of KL and an angular kernel; (3) on dimensionality reduction, we achieve superior qualitative results and better performance on downstream tasks than SNE by replacing KL with a bounded f-divergence. Our results highlight the importance of considering divergence and similarity kernel choices in representation learning optimization.

Problem

Research questions and friction points this paper is trying to address.

Exploring alternative divergences beyond KL for representation learning

Addressing KL divergence asymmetry and unboundedness in optimization

Systematic discovery of novel loss functions using statistical divergences

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses total variation distance for clustering

Employs TV distance with distance kernel

Replaces KL with bounded f-divergence

🔎 Similar Papers

Surpassing Cosine Similarity for Multidimensional Comparisons: Dimension Insensitive Euclidean Metric (DIEM)