Teaching Dense Retrieval Models to Specialize with Listwise Distillation and LLM Data Augmentation

📅 2025-02-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Fine-tuning domain-specialized dense retrieval models often degrades performance—even with hard negative mining and denoising techniques—because the standard pointwise InfoNCE loss impairs generalization. Method: We identify this failure mode and propose listwise knowledge distillation to replace pointwise learning, augmented with high-quality synthetic queries generated by large language models (LLMs); we further employ a cross-encoder teacher to enhance supervision quality. Contribution/Results: Our approach achieves consistent improvements across multiple domain-specific benchmarks. Empirical analysis confirms that LLM-generated queries match human-authored queries in effectiveness. We also find that the cross-encoder teacher’s capacity constitutes the current performance bottleneck. All code and training scripts are publicly released.

Technology Category

Application Category

📝 Abstract
While the current state-of-the-art dense retrieval models exhibit strong out-of-domain generalization, they might fail to capture nuanced domain-specific knowledge. In principle, fine-tuning these models for specialized retrieval tasks should yield higher effectiveness than relying on a one-size-fits-all model, but in practice, results can disappoint. We show that standard fine-tuning methods using an InfoNCE loss can unexpectedly degrade effectiveness rather than improve it, even for domain-specific scenarios. This holds true even when applying widely adopted techniques such as hard-negative mining and negative de-noising. To address this, we explore a training strategy that uses listwise distillation from a teacher cross-encoder, leveraging rich relevance signals to fine-tune the retriever. We further explore synthetic query generation using large language models. Through listwise distillation and training with a diverse set of queries ranging from natural user searches and factual claims to keyword-based queries, we achieve consistent effectiveness gains across multiple datasets. Our results also reveal that synthetic queries can rival human-written queries in training utility. However, we also identify limitations, particularly in the effectiveness of cross-encoder teachers as a bottleneck. We release our code and scripts to encourage further research.
Problem

Research questions and friction points this paper is trying to address.

Improving domain-specific dense retrieval models
Addressing fine-tuning effectiveness degradation
Exploring synthetic query generation utility
Innovation

Methods, ideas, or system contributions that make the work stand out.

Listwise distillation for fine-tuning
Synthetic query generation with LLMs
Cross-encoder teacher relevance signals
🔎 Similar Papers
No similar papers found.