🤖 AI Summary
This work addresses the challenge in sequential recommendation posed by the long-tailed distribution of user-item interactions, where sparse data for tail users or items leads to insufficient preference learning, and existing methods often improve tail performance at the expense of head performance. To tackle this, the authors propose a tail-aware data augmentation framework featuring two novel operators—T-Substitute and T-Insert—that leverage a linear model to capture co-occurrence patterns among tail items, enabling sequence-level substitution and insertion. The framework further integrates representation-level mixing and cross-sequence augmentation strategies. Extensive experiments on multiple benchmark datasets demonstrate that the proposed approach significantly enhances recommendation performance for tail items while simultaneously preserving or even improving overall and head-item recommendation accuracy.
📝 Abstract
Sequential recommendation (SR) learns user preferences based on their historical interaction sequences and provides personalized suggestions. In real-world scenarios, most users can only interact with a handful of items, while the majority of items are seldom consumed. This pervasive long-tail challenge limits the model's ability to learn user preferences. Despite previous efforts to enrich tail items/users with knowledge from head parts or improve tail learning through additional contextual information, they still face the following issues: 1) They struggle to improve the situation where interactions of tail users/items are scarce, leading to incomplete preferences learning for the tail parts. 2) Existing methods often degrade overall or head parts performance when improving accuracy for tail users/items, thereby harming the user experience. We propose Tail-Aware Data Augmentation (TADA) for long-tail sequential recommendation, which enhances the interaction frequency for tail items/users while maintaining head performance, thereby promoting the model's learning capabilities for the tail. Specifically, we first capture the co-occurrence and correlation among low-popularity items by a linear model. Building upon this, we design two tail-aware augmentation operators, T-Substitute and T-Insert. The former replaces the head item with a relevant item, while the latter utilizes co-occurrence relationships to extend the original sequence by incorporating both head and tail items. The augmented and original sequences are mixed at the representation level to preserve preference knowledge. We further extend the mix operation across different tail-user sequences and augmented sequences to generate richer augmented samples, thereby improving tail performance. Comprehensive experiments demonstrate the superiority of our method. The codes are provided at https://github.com/KingGugu/TADA.