Recurrent Preference Memory for Efficient Long-Sequence Generative Recommendation

πŸ“… 2026-02-12
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenges of high computational overhead and accumulated interaction noise in generative recommendation models when processing lifelong user behavior sequences. To mitigate these issues, the authors propose a preference memory–based compression mechanism that recursively condenses long interaction histories into compact embedding tokens, replacing conventional key-value (KV) caching and substantially reducing both memory footprint and computational cost. A self-referential teacher-forcing strategy enables parallelized recursive updates, enhancing model expressiveness while maintaining training and inference efficiency. Experimental results on large-scale benchmarks demonstrate that the proposed method not only significantly lowers inference latency and memory consumption but also achieves superior recommendation accuracy compared to full-sequence models.

Technology Category

Application Category

πŸ“ Abstract
Generative recommendation (GenRec) models typically model user behavior via full attention, but scaling to lifelong sequences is hindered by prohibitive computational costs and noise accumulation from stochastic interactions. To address these challenges, we introduce Rec2PM, a framework that compresses long user interaction histories into compact Preference Memory tokens. Unlike traditional recurrent methods that suffer from serial training, Rec2PM employs a novel self-referential teacher-forcing strategy: it leverages a global view of the history to generate reference memories, which serve as supervision targets for parallelized recurrent updates. This allows for fully parallel training while maintaining the capability for iterative updates during inference. Additionally, by representing memory as token embeddings rather than extensive KV caches, Rec2PM achieves extreme storage efficiency. Experiments on large-scale benchmarks show that Rec2PM significantly reduces inference latency and memory footprint while achieving superior accuracy compared to full-sequence models. Analysis reveals that the Preference Memory functions as a denoising Information Bottleneck, effectively filtering interaction noise to capture robust long-term interests.
Problem

Research questions and friction points this paper is trying to address.

Generative Recommendation
Long-Sequence Modeling
Computational Efficiency
Noise Accumulation
Scalability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Preference Memory
Generative Recommendation
Parallel Recurrent Training
Information Bottleneck
Long-Sequence Modeling
πŸ”Ž Similar Papers
No similar papers found.