GenRec: A Preference-Oriented Generative Framework for Large-Scale Recommendation

πŸ“… 2026-04-16
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

188K/year
πŸ€– AI Summary
This work addresses three key challenges in deploying generative recommendation systems at industrial scale: inconsistent paginated outputs, high computational overhead from encoding long user behavior sequences, and misalignment between generation strategies and user preferences. To tackle these issues, the authors propose GenRec, a decoder-only framework that introduces a page-level next-token prediction task to resolve point-to-set ambiguity, employs an asymmetric linear token merging mechanism to nearly halve input sequence length, and integrates semantic ID-based multi-token representations with a group-relative policy optimization algorithm (GRPO-SR) leveraging hybrid rewards for effective preference alignment. Online A/B tests on the JD.com mobile application demonstrate significant improvements, with a 9.5% increase in clicks and an 8.7% uplift in orders.

Technology Category

Application Category

πŸ“ Abstract
Generative Retrieval (GR) offers a promising paradigm for recommendation through next-token prediction (NTP). However, scaling it to large-scale industrial systems introduces three challenges: (i) within a single request, the identical model inputs may produce inconsistent outputs due to the pagination request mechanism; (ii) the prohibitive cost of encoding long user behavior sequences with multi-token item representations based on semantic IDs, and (iii) aligning the generative policy with nuanced user preference signals. We present GenRec, a preference-oriented generative framework deployed on the JD App that addresses above challenges within a single decoder-only architecture. For training objective, we propose Page-wise NTP task, which supervises over an entire interaction page rather than each interacted item individually, providing denser gradient signal and resolving the one-to-many ambiguity of point-wise training. On the prefilling side, an asymmetric linear Token Merger compresses multi-token Semantic IDs in the prompt while preserving full-resolution decoding, reducing input length by ~2X with negligible accuracy loss. To further align outputs with user satisfaction, we introduce GRPO-SR, a reinforcement learning method that pairs Group Relative Policy Optimization with NLL regularization for training stability, and employs Hybrid Rewards combining a dense reward model with a relevance gate to mitigate reward hacking. In month-long online A/B tests serving production traffic, GenRec achieves 9.5% improvement in click count and 8.7% in transaction count over the existing pipeline.
Problem

Research questions and friction points this paper is trying to address.

Generative Retrieval
Large-Scale Recommendation
User Preference Alignment
Pagination Consistency
Long Sequence Encoding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative Retrieval
Page-wise NTP
Token Merger
GRPO-SR
Semantic ID Compression
πŸ”Ž Similar Papers
No similar papers found.