Ask Optimal Questions: Aligning Large Language Models with Retriever's Preference in Conversational Search

📅 2024-02-19

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

In conversational search, existing query rewriting methods suffer from suboptimal quality due to neglecting retrieval feedback signals. This paper proposes RetPO, a retrieval-guided preference optimization framework: (1) leveraging large language models to generate diverse rewrites; (2) explicitly eliciting and modeling retrieval feedback—e.g., relevance-based rankings—as structured preference signals; (3) constructing RF, the first large-scale conversational retrieval feedback dataset (410K samples); and (4) supervising lightweight rewrite model fine-tuning to achieve end-to-end alignment between retriever preferences and rewriter behavior. Its core innovation lies in formalizing retrieval performance feedback as structured preference signals for rewrite optimization—a novel paradigm. Evaluated on MSMARCO and TREC CAsT, RetPO significantly outperforms strong baselines including GPT-3.5, simultaneously improving rewrite quality and downstream retrieval effectiveness, achieving state-of-the-art performance.

Technology Category

Application Category

📝 Abstract

Conversational search, unlike single-turn retrieval tasks, requires understanding the current question within a dialogue context. The common approach of rewrite-then-retrieve aims to decontextualize questions to be self-sufficient for off-the-shelf retrievers, but most existing methods produce sub-optimal query rewrites due to the limited ability to incorporate signals from the retrieval results. To overcome this limitation, we present a novel framework RetPO (Retriever's Preference Optimization), which is designed to optimize a language model (LM) for reformulating search queries in line with the preferences of the target retrieval systems. The process begins by prompting a large LM to produce various potential rewrites and then collects retrieval performance for these rewrites as the retrievers' preferences. Through the process, we construct a large-scale dataset called RF collection, containing Retrievers' Feedback on over 410K query rewrites across 12K conversations. Furthermore, we fine-tune a smaller LM using this dataset to align it with the retrievers' preferences as feedback. The resulting model achieves state-of-the-art performance on two recent conversational search benchmarks, significantly outperforming existing baselines, including GPT-3.5.

Problem

Research questions and friction points this paper is trying to address.

Optimizing query rewrites for conversational search alignment

Aligning language models with retriever system preferences

Improving retrieval performance via feedback-based query reformulation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimizes LM for query reformulation with retriever preferences

Constructs large-scale dataset with retrievers' feedback

Fine-tunes smaller LM to align with feedback

🔎 Similar Papers

No similar papers found.