🤖 AI Summary
To address the challenge of modeling temporal preferences from short user behavior sequences, this paper proposes BARec, a bidirectional temporal data augmentation and pretraining framework. Methodologically, BARec introduces: (1) a novel bidirectional temporal augmentation mechanism that generates semantically consistent and preference-preserving forward/backward pseudo-items via contrastive learning—without disrupting chronological order; (2) knowledge graph–enhanced fine-tuning to improve the semantic plausibility and preference alignment of pseudo-items; and (3) embedding geometric analysis coupled with theoretical validation to enhance model interpretability. Extensive experiments on multiple benchmark datasets demonstrate that BARec significantly outperforms state-of-the-art methods: it achieves a 12.7% improvement in Recall@20 for short sequences and a 5.3% gain in NDCG@10 for long sequences. Visualization further confirms that pseudo-items exhibit coherent semantic clustering in the embedding space.
📝 Abstract
Sequential recommendation systems are integral to discerning temporal user preferences. Yet, the task of learning from abbreviated user interaction sequences poses a notable challenge. Data augmentation has been identified as a potent strategy to enhance the informational richness of these sequences. Traditional augmentation techniques, such as item randomization, may disrupt the inherent temporal dynamics. Although recent advancements in reverse chronological pseudo-item generation have shown promise, they can introduce temporal discrepancies when assessed in a natural chronological context. In response, we introduce a sophisticated approach, Bidirectional temporal data Augmentation with pre-training (BARec). Our approach leverages bidirectional temporal augmentation and knowledge-enhanced fine-tuning to synthesize authentic pseudo-prior items that emph{retain user preferences and capture deeper item semantic correlations}, thus boosting the model's expressive power. Our comprehensive experimental analysis confirms the superiority of BARec across both short and elongated sequence contexts. Moreover, theoretical examination and visual representation of item embeddings offer further insight into the model's logical processes and interpretability. The source code for our study is available at extcolor{blue}{href{https://github.com/juyongjiang/BARec}{https://github.com/juyongjiang/BARec}}.