Ask Optimal Questions: Aligning Large Language Models with Retriever's Preference in Conversational Search

📅 2024-02-19
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In conversational search, existing query rewriting methods suffer from suboptimal quality due to neglecting retrieval feedback signals. This paper proposes RetPO, a retrieval-guided preference optimization framework: (1) leveraging large language models to generate diverse rewrites; (2) explicitly eliciting and modeling retrieval feedback—e.g., relevance-based rankings—as structured preference signals; (3) constructing RF, the first large-scale conversational retrieval feedback dataset (410K samples); and (4) supervising lightweight rewrite model fine-tuning to achieve end-to-end alignment between retriever preferences and rewriter behavior. Its core innovation lies in formalizing retrieval performance feedback as structured preference signals for rewrite optimization—a novel paradigm. Evaluated on MSMARCO and TREC CAsT, RetPO significantly outperforms strong baselines including GPT-3.5, simultaneously improving rewrite quality and downstream retrieval effectiveness, achieving state-of-the-art performance.

Technology Category

Application Category

📝 Abstract
Conversational search, unlike single-turn retrieval tasks, requires understanding the current question within a dialogue context. The common approach of rewrite-then-retrieve aims to decontextualize questions to be self-sufficient for off-the-shelf retrievers, but most existing methods produce sub-optimal query rewrites due to the limited ability to incorporate signals from the retrieval results. To overcome this limitation, we present a novel framework RetPO (Retriever's Preference Optimization), which is designed to optimize a language model (LM) for reformulating search queries in line with the preferences of the target retrieval systems. The process begins by prompting a large LM to produce various potential rewrites and then collects retrieval performance for these rewrites as the retrievers' preferences. Through the process, we construct a large-scale dataset called RF collection, containing Retrievers' Feedback on over 410K query rewrites across 12K conversations. Furthermore, we fine-tune a smaller LM using this dataset to align it with the retrievers' preferences as feedback. The resulting model achieves state-of-the-art performance on two recent conversational search benchmarks, significantly outperforming existing baselines, including GPT-3.5.
Problem

Research questions and friction points this paper is trying to address.

Optimizing query rewrites for conversational search alignment
Aligning language models with retriever system preferences
Improving retrieval performance via feedback-based query reformulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimizes LM for query reformulation with retriever preferences
Constructs large-scale dataset with retrievers' feedback
Fine-tunes smaller LM to align with feedback
🔎 Similar Papers
No similar papers found.
C
Chanwoong Yoon
Korea University
G
Gangwoo Kim
Korea University
B
Byeongguk Jeon
Korea University
Sungdong Kim
Sungdong Kim
NAVER Cloud, KAIST AI
Yohan Jo
Yohan Jo
Seoul National University
Natural Language ProcessingAgentsComputational PsychologyReasoning
J
Jaewoo Kang
Korea University