Retrieval-Feedback-Driven Distillation and Preference Alignment for Efficient LLM-based Query Expansion

๐Ÿ“… 2026-03-14
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the high inference cost and deployment challenges of large language models (LLMs) for query expansion. The authors propose a retrieval-feedback-driven distillation and preference alignment framework that leverages expansions generated by a teacher model under zero-shot and few-shot settings as both supervision signals and a candidate pool. Preference pairs are automatically constructed based on differences in nDCG@10 scores, and Direct Preference Optimization (DPO) is employed to align the student modelโ€™s generation behavior with retrieval objectives. This study is the first to integrate retrieval-metric-driven preference construction with DPO, enabling efficient and retrieval-friendly compression of query expansion. Experiments show that the distilled Qwen3-4B model achieves 97% of the teacherโ€™s nDCG@10 performance on TREC DL19 while significantly reducing inference overhead, with cross-lingual effectiveness further validated on MIRACL-zh.

Technology Category

Application Category

๐Ÿ“ Abstract
Large language models have recently enabled a generative paradigm for query expansion, but their high inference cost makes direct deployment difficult in practical retrieval systems. To address this issue, a retrieval-feedback-driven distillation and preference-alignment framework is proposed to transfer retrieval-friendly expansion behavior from a strong teacher model to a compact student model. Rather than relying on few-shot exemplars at inference time, the framework first leverages two complementary types of teacher-generated expansions, produced under zero-shot and few-shot prompting conditions, as supervision signals for distillation and as candidate pools for preference construction. A retrieval-metric-driven strategy is then introduced to automatically form chosen/rejected expansion pairs according to nDCG@10 differences, and Direct Preference Optimization is applied to explicitly align generation preferences with retrieval objectives. Experiments on TREC DL19/20/21 and MIRACL-zh show that the proposed approach preserves strong retrieval effectiveness while substantially reducing inference cost. In particular, the distilled Qwen3-4B model reaches about 97% of the teacher (DeepSeek-685B) model's nDCG@10 performance on DL19, and remains effective on the Chinese MIRACL-zh benchmark, demonstrating strong practicality across both English and Chinese retrieval settings.
Problem

Research questions and friction points this paper is trying to address.

query expansion
large language models
inference cost
retrieval systems
efficient deployment
Innovation

Methods, ideas, or system contributions that make the work stand out.

query expansion
knowledge distillation
preference alignment
retrieval feedback
direct preference optimization
๐Ÿ”Ž Similar Papers
No similar papers found.
M
Minghan Li
School of Computer Science and Technology, Soochow University, China
Guodong Zhou
Guodong Zhou
Soochow University, China
Natural Language ProcessingArtificial Intelligence