Retrieval-Feedback-Driven Distillation and Preference Alignment for Efficient LLM-based Query Expansion

📅 2026-03-14

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

This work addresses the high inference cost and deployment challenges of large language models (LLMs) for query expansion. The authors propose a retrieval-feedback-driven distillation and preference alignment framework that leverages expansions generated by a teacher model under zero-shot and few-shot settings as both supervision signals and a candidate pool. Preference pairs are automatically constructed based on differences in nDCG@10 scores, and Direct Preference Optimization (DPO) is employed to align the student model’s generation behavior with retrieval objectives. This study is the first to integrate retrieval-metric-driven preference construction with DPO, enabling efficient and retrieval-friendly compression of query expansion. Experiments show that the distilled Qwen3-4B model achieves 97% of the teacher’s nDCG@10 performance on TREC DL19 while significantly reducing inference overhead, with cross-lingual effectiveness further validated on MIRACL-zh.

Technology Category

Application Category

📝 Abstract

Large language models have recently enabled a generative paradigm for query expansion, but their high inference cost makes direct deployment difficult in practical retrieval systems. To address this issue, a retrieval-feedback-driven distillation and preference-alignment framework is proposed to transfer retrieval-friendly expansion behavior from a strong teacher model to a compact student model. Rather than relying on few-shot exemplars at inference time, the framework first leverages two complementary types of teacher-generated expansions, produced under zero-shot and few-shot prompting conditions, as supervision signals for distillation and as candidate pools for preference construction. A retrieval-metric-driven strategy is then introduced to automatically form chosen/rejected expansion pairs according to nDCG@10 differences, and Direct Preference Optimization is applied to explicitly align generation preferences with retrieval objectives. Experiments on TREC DL19/20/21 and MIRACL-zh show that the proposed approach preserves strong retrieval effectiveness while substantially reducing inference cost. In particular, the distilled Qwen3-4B model reaches about 97% of the teacher (DeepSeek-685B) model's nDCG@10 performance on DL19, and remains effective on the Chinese MIRACL-zh benchmark, demonstrating strong practicality across both English and Chinese retrieval settings.

Problem

Research questions and friction points this paper is trying to address.

query expansion

large language models

inference cost

retrieval systems

efficient deployment

Innovation

Methods, ideas, or system contributions that make the work stand out.

query expansion

knowledge distillation

preference alignment