Aligning Dense Retrievers with LLM Utility via DistillationAligning Dense Retrievers with LLM Utility via Distillation

📅 2026-04-24

📈 Citations: 0

✨ Influential: 0

career value

151K/year

🤖 AI Summary

This work addresses the limitations of dense retrieval in retrieval-augmented generation (RAG), which often suffers from suboptimal precision, and the high computational cost and perplexity-induced noise associated with large language model (LLM)-based utility reranking. The authors formulate retrieval as a distribution matching problem and introduce a Utility-Modulated InfoNCE objective that, for the first time, distills utility signals—derived from LLM-generated reductions in perplexity—into the dual-encoder embedding space. This enables direct alignment with the utility distribution without invoking the LLM at inference time. Evaluated on QASPER, the method substantially outperforms the BGE-Base baseline, achieving a 30.59% gain in Recall@1, a 30.16% improvement in MAP, and a 17.3% increase in Token F1, while operating over 180 times faster than LLM-based reranking.

Technology Category

Application Category

📝 Abstract

Dense vector retrieval is the practical backbone of Retrieval- Augmented Generation (RAG), but similarity search can suffer from precision limitations. Conversely, utility-based approaches leveraging LLM re-ranking often achieve superior performance but are computationally prohibitive and prone to noise inherent in perplexity estimation. We propose Utility-Aligned Embeddings (UAE), a framework designed to merge these advantages into a practical, high-performance retrieval method. We formulate retrieval as a distribution matching problem, training a bi-encoder to imitate a utility distribution derived from perplexity reduction using a Utility-Modulated InfoNCE objective. This approach injects graded utility signals directly into the embedding space without requiring test-time LLM inference. On the QASPER benchmark, UAE improves retrieval Recall@1 by 30.59%, MAP by 30.16% and Token F1 by 17.3% over the strong semantic baseline BGE-Base. Crucially, UAE is over 180x faster than the efficient LLM re-ranking methods preserving competitive performance, demonstrating that aligning retrieval with generative utility yields reliable contexts at scale.

Problem

Research questions and friction points this paper is trying to address.

dense retrieval

LLM utility

retrieval precision

computational efficiency

perplexity noise

Innovation

Methods, ideas, or system contributions that make the work stand out.

Utility-Aligned Embeddings

Dense Retrieval

Retrieval-Augmented Generation