LLM-based Query Expansion Fails for Unfamiliar and Ambiguous Queries

📅 2025-05-19

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

This paper identifies fundamental limitations of large language model (LLM)-driven query expansion (QE) in two prevalent failure scenarios: (1) erroneous expansions due to LLM knowledge gaps, and (2) bias-prone refinements that narrow retrieval scope under query semantic ambiguity. Method: To systematically dissect these failures, the authors first formally distinguish and empirically validate *knowledge insufficiency* and *ambiguity-induced bias* as orthogonal root causes; they then propose a novel QE evaluation framework jointly measuring knowledge coverage and ambiguity robustness, validated via controlled experiments across multiple benchmarks and both sparse (BM25) and dense (ColBERT) retrieval models. Contribution/Results: Quantitative analysis reveals that under knowledge-poor or highly ambiguous queries, NDCG@10 degrades by 18.7% on average—providing critical failure diagnostics and actionable guidance for improving LLM-augmented retrieval.

Technology Category

Application Category

📝 Abstract

Query expansion (QE) enhances retrieval by incorporating relevant terms, with large language models (LLMs) offering an effective alternative to traditional rule-based and statistical methods. However, LLM-based QE suffers from a fundamental limitation: it often fails to generate relevant knowledge, degrading search performance. Prior studies have focused on hallucination, yet its underlying cause--LLM knowledge deficiencies--remains underexplored. This paper systematically examines two failure cases in LLM-based QE: (1) when the LLM lacks query knowledge, leading to incorrect expansions, and (2) when the query is ambiguous, causing biased refinements that narrow search coverage. We conduct controlled experiments across multiple datasets, evaluating the effects of knowledge and query ambiguity on retrieval performance using sparse and dense retrieval models. Our results reveal that LLM-based QE can significantly degrade the retrieval effectiveness when knowledge in the LLM is insufficient or query ambiguity is high. We introduce a framework for evaluating QE under these conditions, providing insights into the limitations of LLM-based retrieval augmentation.

Problem

Research questions and friction points this paper is trying to address.

LLM-based query expansion fails for unfamiliar queries

Ambiguous queries cause biased refinements in LLM-QE

LLM knowledge gaps degrade search performance significantly

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based query expansion fails for unfamiliar queries

Evaluates QE under knowledge deficiency and ambiguity

Proposes framework for LLM-based retrieval augmentation

🔎 Similar Papers

A Survey of NL2SQL with Large Language Models: Where are we, and where are we going?