RL-based Query Rewriting with Distilled LLM for online E-Commerce Systems

๐Ÿ“… 2025-01-29
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
In e-commerce search, query rewriting (QR) faces a fundamental trade-off between efficiency and semantic adaptability: discriminative models lack linguistic flexibility, while large language models (LLMs) incur high online latency, substantial computational cost, and semantic drift. Method: We propose a lightweight distillation framework synergized with online reinforcement learning. Specifically, we introduce an LLM-as-judge mechanism to generate scalable, human-aligned reward signals; integrate offline knowledge distillation with PPO-based online policy optimization; and explicitly model e-commerce searchโ€“specific characteristics (e.g., intent volatility, product taxonomy). Results: Evaluated on the Amazon ESCI dataset, our approach significantly improves relevance, diversity, and dynamic adaptability. It reduces online inference latency by 76% and boosts rewrite accuracy by 19.3%, achieving a favorable balance between low-latency deployment and semantic freshness.

Technology Category

Application Category

๐Ÿ“ Abstract
Query rewriting (QR) is a critical technique in e-commerce search, addressing the lexical gap between user queries and product descriptions to enhance search performance. Existing QR approaches typically fall into two categories: discriminative models and generative methods leveraging large language models (LLMs). Discriminative models often struggle with natural language understanding and offer limited flexibility in rewriting, while generative LLMs, despite producing high-quality rewrites, face high inference latency and cost in online settings. These limitations force offline deployment, making them vulnerable to issues like information staleness and semantic drift. To overcome these challenges, we propose a novel hybrid pipeline for QR that balances efficiency and effectiveness. Our approach combines offline knowledge distillation to create a lightweight but efficient student model with online reinforcement learning (RL) to refine query rewriting dynamically using real-time feedback. A key innovation is the use of LLMs as simulated human feedback, enabling scalable reward signals and cost-effective evaluation without manual annotations. Experimental results on Amazon ESCI dataset demonstrate significant improvements in query relevance, diversity, and adaptability, as well as positive feedback from the LLM simulation. This work contributes to advancing LLM capabilities for domain-specific applications, offering a robust solution for dynamic and complex e-commerce search environments.
Problem

Research questions and friction points this paper is trying to address.

Query Rewriting
E-commerce Search
Semantic Drift
Innovation

Methods, ideas, or system contributions that make the work stand out.

Query Rewriting
Large Language Model
Offline-Online Learning
๐Ÿ”Ž Similar Papers
No similar papers found.