Upcycling Candidate Tokens of Large Language Models for Query Expansion

📅 2025-09-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of balancing diversity and computational efficiency in large language model (LLM)-based query expansion, this paper proposes Candidate Token Query Expansion (CTQE). CTQE innovatively leverages high-confidence candidate tokens—rejected during a single autoregressive decoding step of an LLM—and transforms them into diverse, semantically relevant expansion terms via context-aware relevance filtering, token aggregation, and reweighting. Crucially, CTQE incurs zero additional forward passes, introducing no latency or computational overhead. Evaluated on multiple open-domain retrieval benchmarks, CTQE substantially outperforms conventional query expansion methods and matches or exceeds the performance of costly LLM-based baselines requiring multiple LLM invocations—e.g., achieving competitive or superior Recall@100—thereby demonstrating both high efficiency and strong effectiveness.

Technology Category

Application Category

📝 Abstract
Query Expansion (QE) improves retrieval performance by enriching queries with related terms. Recently, Large Language Models (LLMs) have been used for QE, but existing methods face a trade-off: generating diverse terms boosts performance but increases computational cost. To address this challenge, we propose Candidate Token Query Expansion (CTQE), which extracts diverse and relevant terms from a single LLM decoding pass by leveraging unselected candidate tokens. These tokens, though not part of the final output, are conditioned on the full query and capture useful information. By aggregating them, CTQE achieves both relevance and diversity without extra inference, reducing overhead and latency. Experiments show that CTQE delivers strong retrieval performance with significantly lower cost, outperforming or comparable to more expensive methods. Code is available at: https://github.com/bluejeans8/CTQE
Problem

Research questions and friction points this paper is trying to address.

Reducing computational cost in query expansion with LLMs
Maintaining term diversity without extra inference overhead
Leveraging unselected candidate tokens for efficient expansion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages unselected candidate tokens from LLMs
Extracts diverse terms in single decoding pass
Reduces computational cost without extra inference