Single-Turn LLM Reformulation Powered Multi-Stage Hybrid Re-Ranking for Tip-of-the-Tongue Known-Item Retrieval

📅 2026-02-10

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

This work addresses the challenge of low initial retrieval quality in "tip-of-the-tongue" (ToT) scenarios, where users provide vague queries due to incomplete recall. To bridge this gap, the authors propose a lightweight, single-round query rewriting method using an 8B-parameter large language model without fine-tuning, effectively transforming ambiguous descriptions into precise retrieval intents. The approach is integrated with a multi-stage hybrid re-ranking pipeline that combines BM25 sparse retrieval, dense and late-interaction models (Contriever, E5-large-v2, and ColBERTv2), a monoT5 cross-encoder, and a Qwen2.5-72B listwise re-ranker to fully exploit downstream re-ranking potential. Evaluated on the TREC-ToT 2025 dataset, query rewriting alone improves Recall by 20.61%, while the full pipeline further boosts nDCG@10, MRR, and MAP@10 by 33.88%, 29.92%, and 29.98%, respectively.

Technology Category

Application Category

📝 Abstract

Retrieving known items from vague descriptions, Tip-of-the-Tongue (ToT) retrieval, remains a significant challenge. We propose using a single call to a generic 8B-parameter LLM for query reformulation, bridging the gap between ill-formed ToT queries and specific information needs. This method is particularly effective where standard Pseudo-Relevance Feedback fails due to poor initial recall. Crucially, our LLM is not fine-tuned for ToT or specific domains, demonstrating that gains stem from our prompting strategy rather than model specialization. Rewritten queries feed a multi-stage pipeline: sparse retrieval (BM25), dense/late-interaction reranking (Contriever, E5-large-v2, ColBERTv2), monoT5 cross-encoding, and list-wise reranking (Qwen 2.5 72B). Experiments on 2025 TREC-ToT datasets show that while raw queries yield poor performance, our lightweight pre-retrieval transformation improves Recall by 20.61%. Subsequent reranking improves nDCG@10 by 33.88%, MRR by 29.92%, and MAP@10 by 29.98%, offering a cost-effective intervention that unlocks the potential of downstream rankers. Code and data: https://github.com/debayan1405/TREC-TOT-2025

Problem

Research questions and friction points this paper is trying to address.

Tip-of-the-Tongue retrieval

known-item retrieval

vague query

query reformulation

information retrieval

Innovation

Methods, ideas, or system contributions that make the work stand out.

query reformulation

tip-of-the-tongue retrieval

large language model