DIVE: Embedding Compression via Self-Limiting Gradient Updates

📅 2026-05-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

202K/year
🤖 AI Summary
This work addresses the challenge that existing embedding compression methods often overfit under label-scarce conditions and frequently underperform frozen baselines in retrieval tasks. To overcome this, the authors propose DIVE, a compression adapter that jointly employs a self-limited gradient triplet loss and a head-level NT-Xent contrastive loss to preserve retrieval performance while reducing embedding dimensionality. The key innovations include a self-limited gradient mechanism to prevent excessive perturbation of the pretrained embedding space, combined with a lightweight residual adapter architecture and implicit view augmentation to strengthen self-supervised signals in few-shot settings. Evaluated across six BEIR benchmarks, DIVE consistently outperforms existing adapter-based compression approaches at all compression ratios, achieving strong results with only 14M parameters. The code is publicly available.
📝 Abstract
High-dimensional embeddings from large language models impose significant storage and computational costs on vector search systems. Recent embedding compression methods, including Matryoshka-Adaptor (EMNLP 2024), Search-Adaptor (ACL 2024), and SMEC (EMNLP 2025), enable dimensionality reduction through lightweight residual adapters, but their training objectives cause severe overfitting when labeled data is scarce, degrading retrieval performance below the frozen baseline. We propose \textsc{DIVE} (\textbf{D}imensionality reduction with \textbf{I}mplicit \textbf{V}iew \textbf{E}nsembles), a compression adapter that addresses this failure through two mechanisms. First, a self-limiting hinge-based triplet loss produces zero gradient once a triplet satisfies the margin constraint, bounding the total perturbation applied to the pretrained embedding space. Second, a head-wise NT-Xent contrastive loss treats multiple learned projections of each embedding as implicit views, providing dense self-supervised gradients that compensate for the sparsity of the triplet signal on small datasets. Across six BEIR datasets, \textsc{DIVE} outperforms all three baseline adapters on every dataset and at every evaluated compression ratio, with a 14M-parameter open-source implementation.
Problem

Research questions and friction points this paper is trying to address.

embedding compression
overfitting
retrieval performance
limited labeled data
dimensionality reduction
Innovation

Methods, ideas, or system contributions that make the work stand out.

embedding compression
self-limiting gradient
implicit view ensembles
contrastive learning
triplet loss