Automatic In-Domain Exemplar Construction and LLM-Based Refinement of Multi-LLM Expansions for Query Expansion

📅 2026-02-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a fully automatic, unsupervised domain-adaptive query expansion framework that overcomes the limitations of existing methods—such as reliance on manual prompting, handcrafted example selection, or a single language model—which suffer from poor scalability and weak cross-domain transferability. The approach first constructs an in-domain example pool via BM25-MonoT5 pseudo-relevance feedback and then selects diverse examples using a training-free clustering strategy. It further introduces an innovative ensemble mechanism wherein two large language models (LLMs) collaboratively generate expansions, which are subsequently refined by a third LLM, all without requiring labeled data. Evaluated on TREC DL2020, DBpedia, and SciFact, the method significantly outperforms BM25, Rocchio, zero-shot, and fixed few-shot baselines, demonstrating statistically significant gains, robust performance, and strong generalization across domains.

Technology Category

Application Category

📝 Abstract
Query expansion with large language models is promising but often relies on hand-crafted prompts, manually chosen exemplars, or a single LLM, making it non-scalable and sensitive to domain shift. We present an automated, domain-adaptive QE framework that builds in-domain exemplar pools by harvesting pseudo-relevant passages using a BM25-MonoT5 pipeline. A training-free cluster-based strategy selects diverse demonstrations, yielding strong and stable in-context QE without supervision. To further exploit model complementarity, we introduce a two-LLM ensemble in which two heterogeneous LLMs independently generate expansions and a refinement LLM consolidates them into one coherent expansion. Across TREC DL20, DBPedia, and SciFact, the refined ensemble delivers consistent and statistically significant gains over BM25, Rocchio, zero-shot, and fixed few-shot baselines. The framework offers a reproducible testbed for exemplar selection and multi-LLM generation, and a practical, label-free solution for real-world QE.
Problem

Research questions and friction points this paper is trying to address.

query expansion
large language models
domain shift
exemplar selection
scalability
Innovation

Methods, ideas, or system contributions that make the work stand out.

query expansion
in-domain exemplar construction
multi-LLM ensemble
cluster-based demonstration selection
LLM-based refinement
🔎 Similar Papers
No similar papers found.
M
Minghan Li
School of Computer Science and Technology, Soochow University, Suzhou, China
Ercong Nie
Ercong Nie
LMU Munich, MCML
Computational LinguisticsNatural Language Processing
S
Siqi Zhao
School of Computer Science and Technology, Soochow University, Suzhou, China
T
Tongna Chen
School of Computer Science and Technology, Soochow University, Suzhou, China
H
Huiping Huang
Chalmers University of Technology, Gothenburg, Sweden
Guodong Zhou
Guodong Zhou
Soochow University, China
Natural Language ProcessingArtificial Intelligence