π€ AI Summary
This work addresses the challenges of applying generative retrieval in industrial e-commerce searchβnamely, the vast and dynamic product catalog, stringent latency constraints, and misalignment with downstream ranking objectives. To overcome these issues, the authors propose CQ-SID, a novel framework that treats generative retrieval as a complementary recall component. It introduces category-aware query-item contrastive learning and a residual quantized VAE to generate hierarchical semantic cluster IDs, substantially reducing beam search complexity. Furthermore, an expert-guided group relative policy optimization (EG-GRPO) reinforcement learning approach is designed to align retrieval outputs with downstream ranking goals. Offline experiments demonstrate a 26.76% improvement in semantic hit rate and an 11.11% gain in personalized click-through hit rate, alongside a 50% reduction in beam size. Online A/B tests show a 1.15% increase in GMV and a 0.40% uplift in UCTCVR, with over 72% of conversions attributed to the generative recall channel.
π Abstract
Generative retrieval offers a promising alternative by unifying the fragmented multi-stage retrieval process into a single end-to-end model. However, its practical adoption in industrial e-commerce search remains challenging, given the massive and dynamic product catalogs, strict latency requirements, and the need to align retrieval with downstream ranking goals. In this work, we propose a retrieval framework tailored for real-world recall scenarios, positioning generative retrieval as a recall-stage supplement rather than an end-to-end replacement. Our method, CQ-SID (Category-and-Query constrained Semantic ID), employs category-aware and query-item contrastive learning along with Residual Quantized VAEs to encode items into hierarchical semantic cluster identifiers, significantly reducing beam search complexity. Additionally, we develop EG-GRPO (Expert-Guided Group Relative Policy Optimization), a reinforcement learning approach that aligns generative recall with downstream ranking under sparse rewards by injecting ground-truth samples to stabilize training. Offline experiments on TmallAPP search logs show that CQ-SID achieves up to 26.76% and 11.11% relative gains in semantic and personalized click hitrate over RQ-VAE baselines, while halving beam search size. EG-GRPO further improves multi-objective performance. Online A/B tests confirm gains in GMV (+1.15%) and UCTCVR (+0.40%). The generative recall channel now contributes substantially in production, accounting for over 50.25% of exposures, 58.96% of clicks, and 72.63% of purchases, demonstrating a viable path for deploying generative retrieval in real-world e-commerce systems.