Reasoning-enhanced Query Understanding through Decomposition and Interpretation

📅 2025-09-08

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

Addressing the challenges of user intent understanding and limited retrieval effectiveness for complex, long queries, this paper proposes ReDI—a three-stage reasoning-enhanced framework. ReDI leverages large language models (LLMs) to decompose long queries, generate semantic interpretations for each subquery, and fuse multi-path retrieval results, thereby enabling precise intent modeling and optimized document matching. It is the first work to systematically investigate LLM-driven collaborative mechanisms for semantic parsing of long queries and introduces knowledge distillation to construct a lightweight, deployable model. Evaluated on the BRIGHT and BEIR benchmarks, ReDI consistently outperforms strong baselines under both sparse and dense retrieval paradigms, demonstrating its architecture-agnostic effectiveness and practical applicability.

Technology Category

Application Category

📝 Abstract

Accurate inference of user intent is crucial for enhancing document retrieval in modern search engines. While large language models (LLMs) have made significant strides in this area, their effectiveness has predominantly been assessed with short, keyword-based queries. As AI-driven search evolves, long-form queries with intricate intents are becoming more prevalent, yet they remain underexplored in the context of LLM-based query understanding (QU). To bridge this gap, we introduce ReDI: a Reasoning-enhanced approach for query understanding through Decomposition and Interpretation. ReDI leverages the reasoning and comprehension capabilities of LLMs in a three-stage pipeline: (i) it breaks down complex queries into targeted sub-queries to accurately capture user intent; (ii) it enriches each sub-query with detailed semantic interpretations to improve the query-document matching; and (iii) it independently retrieves documents for each sub-query and employs a fusion strategy to aggregate the results for the final ranking. We compiled a large-scale dataset of real-world complex queries from a major search engine and distilled the query understanding capabilities of teacher models into smaller models for practical application. Experiments on BRIGHT and BEIR demonstrate that ReDI consistently surpasses strong baselines in both sparse and dense retrieval paradigms, affirming its effectiveness.

Problem

Research questions and friction points this paper is trying to address.

Enhancing intent inference for complex long-form queries

Improving query understanding in AI-driven search engines

Bridging the gap in LLM-based query decomposition

Innovation

Methods, ideas, or system contributions that make the work stand out.

Three-stage LLM pipeline for query decomposition

Semantic enrichment of sub-queries for better matching

Independent retrieval with fusion-based result aggregation

🔎 Similar Papers

No similar papers found.