Efficient Conversational Search via Topical Locality in Dense Retrieval

📅 2025-04-30

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

To address high response latency in conversational search—which critically degrades user experience—this paper introduces, for the first time, a dynamic clustering constraint mechanism to model topic locality in conversational queries. We propose a dense retrieval acceleration framework integrating semantic clustering indexing and query-aware vector pruning. The method leverages multi-model embeddings, adapts to the TREC CAsT benchmark, and employs a dual-mode optimization strategy: lossless acceleration and high-cost-effectiveness acceleration. On TREC CAsT 2019/2020, it achieves up to 10.4× speedup with only a 4.4% drop in NDCG@3; at 4.4× speedup, it incurs zero precision loss. Retrieval quality remains state-of-the-art across multi-turn, complex queries. Our core contributions are (1) the dynamic modeling of topic locality in conversational contexts, and (2) semantic-aware, real-time compression of the candidate retrieval space—enabling efficient yet accurate dense retrieval.

Technology Category

Application Category

📝 Abstract

Pre-trained language models have been widely exploited to learn dense representations of documents and queries for information retrieval. While previous efforts have primarily focused on improving effectiveness and user satisfaction, response time remains a critical bottleneck of conversational search systems. To address this, we exploit the topical locality inherent in conversational queries, i.e., the tendency of queries within a conversation to focus on related topics. By leveraging query embedding similarities, we dynamically restrict the search space to semantically relevant document clusters, reducing computational complexity without compromising retrieval quality. We evaluate our approach on the TREC CAsT 2019 and 2020 datasets using multiple embedding models and vector indexes, achieving improvements in processing speed of up to 10.4X with little loss in performance (4.4X without any loss). Our results show that the proposed system effectively handles complex, multiturn queries with high precision and efficiency, offering a practical solution for real-time conversational search.

Problem

Research questions and friction points this paper is trying to address.

Reducing response time in conversational search systems

Leveraging topical locality to optimize search space

Balancing speed and accuracy in dense retrieval

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages topical locality in conversational queries

Dynamically restricts search space using embedding similarities

Improves processing speed up to 10.4X without performance loss

🔎 Similar Papers

AdaCQR: Enhancing Query Reformulation for Conversational Search via Sparse and Dense Retrieval Alignment