From Local Indices to Global Identifiers: Generative Reranking for Recommender Systems via Global Action Space

📅 2026-04-28
📈 Citations: 0
Influential: 0
📄 PDF

career value

209K/year
🤖 AI Summary
This work addresses a critical limitation in existing recommendation reranking methods, which rely on local index selection and consequently suffer from inconsistent action semantics and unstable item representations. To overcome this, the paper proposes the first approach that formulates reranking as a global identifier generation task, representing items via discrete token sequences to construct a semantically coherent action space and eliminate dependence on input order. The method employs a two-stage optimization strategy combining supervised pretraining followed by reinforcement learning fine-tuning. Extensive experiments demonstrate consistent and significant improvements over state-of-the-art baselines across two public benchmarks and a large-scale industrial dataset. Online A/B tests further confirm its practical effectiveness, with particularly notable gains in cold-start scenarios.
📝 Abstract
In modern recommender systems, list-wise reranking serves as a critical phase within the multi-stage pipeline, finalizing the exposed item sequence and directly impacting user satisfaction by modeling complex intra-list item dependencies. Existing methods typically formulate this task as selecting indices from the local input list. However, this approach suffers from a semantically inconsistent action space: the same output neuron (logits) represents different items across different samples, preventing the model from establishing a stable, intrinsic understanding of the items. To address this, we propose GloRank (Global Action Space Ranker), a generative framework that shifts reranking from selecting local indices to generating global identifiers. Specifically, we represent items as sequences of discrete tokens and reformulate reranking as a token generation task. This design effectively decouples the scoring mechanism from the variable input order, ensuring that items are evaluated against a consistent global standard. We further enhance this with a two-stage optimization pipeline: a supervised pre-training phase to initialize the model with high-quality demonstrations, followed by a reinforcement learning-based post-training phase to directly maximize list-wise utility. Extensive experiments on two public benchmarks and a large-scale industrial dataset, coupled with online A/B tests, demonstrate that GloRank consistently outperforms state-of-the-art baselines and achieves superior robustness in cold-start scenarios.
Problem

Research questions and friction points this paper is trying to address.

reranking
action space
recommender systems
semantic inconsistency
global identifiers
Innovation

Methods, ideas, or system contributions that make the work stand out.

generative reranking
global action space
token-based item representation
list-wise recommendation
reinforcement learning