π€ AI Summary
Addressing the challenges of user intent understanding and limited retrieval effectiveness for complex, long queries, this paper proposes ReDIβa three-stage reasoning-enhanced framework. ReDI leverages large language models (LLMs) to decompose long queries, generate semantic interpretations for each subquery, and fuse multi-path retrieval results, thereby enabling precise intent modeling and optimized document matching. It is the first work to systematically investigate LLM-driven collaborative mechanisms for semantic parsing of long queries and introduces knowledge distillation to construct a lightweight, deployable model. Evaluated on the BRIGHT and BEIR benchmarks, ReDI consistently outperforms strong baselines under both sparse and dense retrieval paradigms, demonstrating its architecture-agnostic effectiveness and practical applicability.
π Abstract
Accurate inference of user intent is crucial for enhancing document retrieval in modern search engines. While large language models (LLMs) have made significant strides in this area, their effectiveness has predominantly been assessed with short, keyword-based queries. As AI-driven search evolves, long-form queries with intricate intents are becoming more prevalent, yet they remain underexplored in the context of LLM-based query understanding (QU). To bridge this gap, we introduce ReDI: a Reasoning-enhanced approach for query understanding through Decomposition and Interpretation. ReDI leverages the reasoning and comprehension capabilities of LLMs in a three-stage pipeline: (i) it breaks down complex queries into targeted sub-queries to accurately capture user intent; (ii) it enriches each sub-query with detailed semantic interpretations to improve the query-document matching; and (iii) it independently retrieves documents for each sub-query and employs a fusion strategy to aggregate the results for the final ranking. We compiled a large-scale dataset of real-world complex queries from a major search engine and distilled the query understanding capabilities of teacher models into smaller models for practical application. Experiments on BRIGHT and BEIR demonstrate that ReDI consistently surpasses strong baselines in both sparse and dense retrieval paradigms, affirming its effectiveness.