DIVE: Embedding Compression via Self-Limiting Gradient Updates

📅 2026-05-20

📈 Citations: 0

✨ Influential: 0

career value

151K/year

🤖 AI Summary

This work addresses the challenge that existing embedding compression methods often overfit under label-scarce conditions and frequently underperform frozen baselines in retrieval tasks. To overcome this, the authors propose DIVE, a compression adapter that jointly employs a self-limited gradient triplet loss and a head-level NT-Xent contrastive loss to preserve retrieval performance while reducing embedding dimensionality. The key innovations include a self-limited gradient mechanism to prevent excessive perturbation of the pretrained embedding space, combined with a lightweight residual adapter architecture and implicit view augmentation to strengthen self-supervised signals in few-shot settings. Evaluated across six BEIR benchmarks, DIVE consistently outperforms existing adapter-based compression approaches at all compression ratios, achieving strong results with only 14M parameters. The code is publicly available.

📝 Abstract

High-dimensional embeddings from large language models impose significant storage and computational costs on vector search systems. Recent embedding compression methods, including Matryoshka-Adaptor (EMNLP 2024), Search-Adaptor (ACL 2024), and SMEC (EMNLP 2025), enable dimensionality reduction through lightweight residual adapters, but their training objectives cause severe overfitting when labeled data is scarce, degrading retrieval performance below the frozen baseline. We propose \textsc{DIVE} (\textbf{D}imensionality reduction with \textbf{I}mplicit \textbf{V}iew \textbf{E}nsembles), a compression adapter that addresses this failure through two mechanisms. First, a self-limiting hinge-based triplet loss produces zero gradient once a triplet satisfies the margin constraint, bounding the total perturbation applied to the pretrained embedding space. Second, a head-wise NT-Xent contrastive loss treats multiple learned projections of each embedding as implicit views, providing dense self-supervised gradients that compensate for the sparsity of the triplet signal on small datasets. Across six BEIR datasets, \textsc{DIVE} outperforms all three baseline adapters on every dataset and at every evaluated compression ratio, with a 14M-parameter open-source implementation.

Problem

Research questions and friction points this paper is trying to address.

embedding compression

overfitting

retrieval performance

limited labeled data

dimensionality reduction

Innovation

Methods, ideas, or system contributions that make the work stand out.

embedding compression

self-limiting gradient

implicit view ensembles