🤖 AI Summary
To address high response latency in conversational search—which critically degrades user experience—this paper introduces, for the first time, a dynamic clustering constraint mechanism to model topic locality in conversational queries. We propose a dense retrieval acceleration framework integrating semantic clustering indexing and query-aware vector pruning. The method leverages multi-model embeddings, adapts to the TREC CAsT benchmark, and employs a dual-mode optimization strategy: lossless acceleration and high-cost-effectiveness acceleration. On TREC CAsT 2019/2020, it achieves up to 10.4× speedup with only a 4.4% drop in NDCG@3; at 4.4× speedup, it incurs zero precision loss. Retrieval quality remains state-of-the-art across multi-turn, complex queries. Our core contributions are (1) the dynamic modeling of topic locality in conversational contexts, and (2) semantic-aware, real-time compression of the candidate retrieval space—enabling efficient yet accurate dense retrieval.
📝 Abstract
Pre-trained language models have been widely exploited to learn dense representations of documents and queries for information retrieval. While previous efforts have primarily focused on improving effectiveness and user satisfaction, response time remains a critical bottleneck of conversational search systems. To address this, we exploit the topical locality inherent in conversational queries, i.e., the tendency of queries within a conversation to focus on related topics. By leveraging query embedding similarities, we dynamically restrict the search space to semantically relevant document clusters, reducing computational complexity without compromising retrieval quality. We evaluate our approach on the TREC CAsT 2019 and 2020 datasets using multiple embedding models and vector indexes, achieving improvements in processing speed of up to 10.4X with little loss in performance (4.4X without any loss). Our results show that the proposed system effectively handles complex, multiturn queries with high precision and efficiency, offering a practical solution for real-time conversational search.