Learning Effective Representations for Retrieval Using Self-Distillation with Adaptive Relevance Margins

📅 2024-07-31
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
To address the inefficiency of dual-encoder retrieval models—namely, their reliance on teacher models, complex batch sampling strategies, and low training throughput—this paper proposes a parameter-free self-distillation loss. Our method eliminates the need for external teachers or explicit hard negative sampling by leveraging the intrinsic semantic capabilities of pretrained language models to perform implicit hard negative mining and self-supervised optimization. Furthermore, we introduce an adaptive relevance margin to enhance representation discriminability. Empirically, our approach achieves performance on par with teacher-based distillation baselines using only 13.5% of the training data, while accelerating training by 3–15×. All code and datasets are publicly released.

Technology Category

Application Category

📝 Abstract
Representation-based retrieval models, so-called biencoders, estimate the relevance of a document to a query by calculating the similarity of their respective embeddings. Current state-of-the-art biencoders are trained using an expensive training regime involving knowledge distillation from a teacher model and batch-sampling. Instead of relying on a teacher model, we contribute a novel parameter-free loss function for self-supervision that exploits the pre-trained language modeling capabilities of the encoder model as a training signal, eliminating the need for batch sampling by performing implicit hard negative mining. We investigate the capabilities of our proposed approach through extensive ablation studies, demonstrating that self-distillation can match the effectiveness of teacher distillation using only 13.5% of the data, while offering a speedup in training time between 3x and 15x compared to parametrized losses. Code and data is made openly available.
Problem

Research questions and friction points this paper is trying to address.

Improving bi-encoder retrieval models without teacher distillation
Reducing training cost via self-supervised parameter-free loss
Enhancing efficiency with implicit hard negative mining
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervision with adaptive relevance margins
Parameter-free loss function for training
Implicit hard negative mining technique
🔎 Similar Papers
No similar papers found.