๐ค AI Summary
In conversational query rewriting (CQR), a key practical bottleneck is the scarcity of ground-truth reference queries for supervision. To address this, we propose DualReformโa reference-free preference optimization framework. Methodologically, DualReform introduces (1) a response-driven pseudo-reference generation mechanism that leverages model-generated responses to self-construct high-quality pseudo-labels, and (2) a dual-task co-training paradigm jointly optimizing response generation and query rewriting via preference learning. Without relying on any human-annotated reference queries, DualReform achieves 96.9%โ99.1% of the retrieval accuracy attained by fully supervised state-of-the-art methods across multiple benchmarks, outperforming the best existing unsupervised approach by up to 31.6%. These results demonstrate substantial progress in alleviating CQRโs dependency on costly manual reference annotations while maintaining competitive retrieval performance.
๐ Abstract
Conversational query reformulation (CQR) has become indispensable for improving retrieval in dialogue-based applications. However, existing approaches typically rely on reference passages for optimization, which are impractical to acquire in real-world scenarios. To address this limitation, we introduce a novel reference-free preference optimization framework DualReform that generates pseudo reference passages from commonly-encountered conversational datasets containing only queries and responses. DualReform attains this goal through two key innovations: (1) response-based inference, where responses serve as proxies to infer pseudo reference passages, and (2) response refinement via the dual-role of CQR, where a CQR model refines responses based on the shared objectives between response refinement and CQR. Despite not relying on reference passages, DualReform achieves 96.9--99.1% of the retrieval accuracy attainable only with reference passages and surpasses the state-of-the-art method by up to 31.6%.