Sequential Data Augmentation for Generative Recommendation

πŸ“… 2025-09-16
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Data augmentation in generative recommendation has long been underappreciated, with existing methods lacking systematic modeling and failing to balance generalizability and efficiency. To address this, we propose GenPASβ€”a novel framework that formally characterizes sequential data augmentation as a bias-controlled, three-stage stochastic sampling process: input sampling, sequence sampling, and target sampling. GenPAS unifies mainstream augmentation strategies and enables explicit control over the training distribution. By decoupling structural fidelity from semantic bias in augmentation, it enhances model robustness to long-tail patterns and sparse user-item interactions. Extensive experiments on multiple public and industrial benchmarks demonstrate that GenPAS consistently improves Recall@K and NDCG while reducing required training data by over 30% and model parameters by 20%, validating its synergistic gains in accuracy, data efficiency, and parameter efficiency.

Technology Category

Application Category

πŸ“ Abstract
Generative recommendation plays a crucial role in personalized systems, predicting users' future interactions from their historical behavior sequences. A critical yet underexplored factor in training these models is data augmentation, the process of constructing training data from user interaction histories. By shaping the training distribution, data augmentation directly and often substantially affects model generalization and performance. Nevertheless, in much of the existing work, this process is simplified, applied inconsistently, or treated as a minor design choice, without a systematic and principled understanding of its effects. Motivated by our empirical finding that different augmentation strategies can yield large performance disparities, we conduct an in-depth analysis of how they reshape training distributions and influence alignment with future targets and generalization to unseen inputs. To systematize this design space, we propose GenPAS, a generalized and principled framework that models augmentation as a stochastic sampling process over input-target pairs with three bias-controlled steps: sequence sampling, target sampling, and input sampling. This formulation unifies widely used strategies as special cases and enables flexible control of the resulting training distribution. Our extensive experiments on benchmark and industrial datasets demonstrate that GenPAS yields superior accuracy, data efficiency, and parameter efficiency compared to existing strategies, providing practical guidance for principled training data construction in generative recommendation.
Problem

Research questions and friction points this paper is trying to address.

Systematically analyzing data augmentation effects on generative recommendation
Proposing principled framework for stochastic sampling in training data construction
Improving model accuracy and efficiency through controlled augmentation strategies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Stochastic sampling process framework
Bias-controlled three-step augmentation
Unifies strategies as special cases
πŸ”Ž Similar Papers
No similar papers found.
G
Geon Lee
KAIST, Seoul, Republic of Korea
Bhuvesh Kumar
Bhuvesh Kumar
Snap
Recommendation SystemsMachine LearningAlgorithmic Game TheoryDifferential Privacy
C
Clark Mingxuan Ju
Snap Inc., Bellevue, WA, USA
T
Tong Zhao
Snap Inc., Bellevue, WA, USA
Kijung Shin
Kijung Shin
Associate Professor, KAIST
Data MiningGraph MiningNetwork Science
N
Neil Shah
Snap Inc., Bellevue, WA, USA
L
Liam Collins
Snap Inc., Bellevue, WA, USA