🤖 AI Summary
This work addresses the limitations of traditional retrieval-augmented generation (RAG) systems, which rely on relevance-based retrieval and often fail to ensure that retrieved documents effectively improve downstream generation quality. While existing utility-driven approaches aim to align retrieval with generation performance, they suffer from high computational overhead and lack listwise ranking optimization. To bridge this gap, the authors propose LURE-RAG, a framework that integrates a lightweight LambdaMART reranker after any black-box retriever. LURE-RAG is the first to combine listwise ranking losses with utility signals derived from large language model feedback, directly optimizing document rankings to enhance generation outcomes. The method achieves a compelling balance between efficiency and effectiveness, attaining 97–98% of the performance of state-of-the-art dense retrieval baselines; its dense variant, UR-RAG, further surpasses current best methods by 3%, all while maintaining training and inference efficiency.
📝 Abstract
Most conventional Retrieval-Augmented Generation (RAG) pipelines rely on relevance-based retrieval, which often misaligns with utility -- that is, whether the retrieved passages actually improve the quality of the generated text specific to a downstream task such as question answering or query-based summarization. The limitations of existing utility-driven retrieval approaches for RAG are that, firstly, they are resource-intensive typically requiring query encoding, and that secondly, they do not involve listwise ranking loss during training. The latter limitation is particularly critical, as the relative order between documents directly affects generation in RAG. To address this gap, we propose Lightweight Utility-driven Reranking for Efficient RAG (LURE-RAG), a framework that augments any black-box retriever with an efficient LambdaMART-based reranker. Unlike prior methods, LURE-RAG trains the reranker with a listwise ranking loss guided by LLM utility, thereby directly optimizing the ordering of retrieved documents. Experiments on two standard datasets demonstrate that LURE-RAG achieves competitive performance, reaching 97-98% of the state-of-the-art dense neural baseline, while remaining efficient in both training and inference. Moreover, its dense variant, UR-RAG, significantly outperforms the best existing baseline by up to 3%.