🤖 AI Summary
Existing conversational search systems typically adopt a two-stage, decoupled architecture—separate retrieval and generation models—leading to fragmented contextual understanding and insufficient support from retrieved results for response generation. Method: This paper proposes the first unified framework for joint fine-tuning of dense retrieval and large language model (LLM)-based response generation. It employs a shared semantic space with task-specific module modeling, enabling collaborative optimization while preserving functional modularity. We introduce a contrastive-driven joint training mechanism and a context-aware instruction fine-tuning strategy to mitigate training inconsistency and data distribution mismatch. Contribution/Results: Our approach achieves state-of-the-art performance across five conversational search benchmarks, demonstrating bidirectional benefits between retrieval and generation. It significantly improves multi-turn dialogue understanding and response quality, validating the efficacy of end-to-end, jointly optimized conversational search.
📝 Abstract
The rapid advancement of conversational search systems revolutionizes how information is accessed by enabling the multi-turn interaction between the user and the system. Existing conversational search systems are usually built with two different models. This separation restricts the system from leveraging the intrinsic knowledge of the models simultaneously, which cannot ensure the effectiveness of retrieval benefiting the generation. The existing studies for developing unified models cannot fully address the aspects of understanding conversational context, managing retrieval independently, and generating responses. In this paper, we explore how to unify dense retrieval and response generation for large language models in conversation. We conduct joint fine-tuning with different objectives and design two mechanisms to reduce the inconsistency risks while mitigating data discrepancy. The evaluations on five conversational search datasets demonstrate that our unified model can mutually improve both tasks and outperform the existing baselines.