LESER: Learning to Expand via Search Engine-feedback Reinforcement in e-Commerce

📅 2025-09-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
E-commerce search queries are typically ambiguous, short, and exhibit diverse or one-to-many user intents, making precise matching against structured product catalogs challenging. Existing neural query expansion and prompt-based large language model (LLM) approaches suffer from poor generalization, violation of platform constraints, and limited engineering feasibility. To address these limitations, we propose the first end-to-end query expansion framework grounded in search feedback-driven reinforcement learning. Our method leverages real-time search engine signals—such as clicks and dwell time—as supervisory feedback, integrates a context-aware LLM, and employs Group Relative Policy Optimization to jointly optimize relevance and semantic coverage. Evaluated on large-scale industrial e-commerce data, our approach achieves significant improvements in both offline metrics and online click-through rate, while demonstrating strong scalability and production readiness.

Technology Category

Application Category

📝 Abstract
User queries in e-commerce search are often vague, short, and underspecified, making it difficult for retrieval systems to match them accurately against structured product catalogs. This challenge is amplified by the one-to-many nature of user intent, where a single query can imply diverse and competing needs. Existing methods, including neural query expansion and prompting-based LLM approaches, fall short in real-world settings: they struggle to capture nuanced user intent, often generate outputs that violate platform constraints, and rely on workflows that are difficult to scale in production. We propose Learning to Expand via Search Engine-feedback Reinforcement (LESER), a novel framework that fine-tunes a context-aware LLM using real-time search engine feedback as supervision. LESER formulates query expansion as a retrieval optimization task and leverages Group Relative Policy Optimization to learn directly from relevance and coverage metrics. LESER is trained to reason over search results and produce high quality query expansions that align with platform rules and retrieval objectives. We evaluate LESER on large-scale, real-world e-commerce datasets, demonstrating substantial improvements in both offline and online settings. Our results show that LESER not only enhances semantic coverage and retrieval relevance but also delivers measurable gains in user engagement, making it a practical and scalable solution for modern search systems.
Problem

Research questions and friction points this paper is trying to address.

Expanding vague e-commerce queries for better retrieval
Aligning query expansion with platform constraints and objectives
Improving semantic coverage and relevance in search results
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tunes LLM using real-time search feedback
Leverages Group Relative Policy Optimization
Produces query expansions aligning with platform rules
🔎 Similar Papers
No similar papers found.