🤖 AI Summary
To address the challenges of scarce labeled data and high computational cost from large-scale negative sampling in user sequence modeling for recommendation systems, this paper introduces the Barlow Twins self-supervised learning framework to behavioral sequence modeling for the first time. We propose lightweight, sequence-aware augmentation strategies—including masking and reordering—eliminating explicit negative sampling and enabling efficient mini-batch training. Our method employs a dual-encoder architecture with sequence-level contrastive representation learning, substantially reducing reliance on negative samples and manual annotations. Extensive experiments on MovieLens-1M, MovieLens-20M, and Yelp demonstrate consistent improvements across three downstream recommendation tasks, achieving accuracy gains of 8%–20% over strong dual-encoder baselines. The approach offers a scalable, annotation-efficient paradigm for sequential recommendation.
📝 Abstract
User sequence modeling is crucial for modern large-scale recommendation systems, as it enables the extraction of informative representations of users and items from their historical interactions. These user representations are widely used for a variety of downstream tasks to enhance users' online experience. A key challenge for learning these representations is the lack of labeled training data. While self-supervised learning (SSL) methods have emerged as a promising solution for learning representations from unlabeled data, many existing approaches rely on extensive negative sampling, which can be computationally expensive and may not always be feasible in real-world scenario. In this work, we propose an adaptation of Barlow Twins, a state-of-the-art SSL methods, to user sequence modeling by incorporating suitable augmentation methods. Our approach aims to mitigate the need for large negative sample batches, enabling effective representation learning with smaller batch sizes and limited labeled data. We evaluate our method on the MovieLens-1M, MovieLens-20M, and Yelp datasets, demonstrating that our method consistently outperforms the widely-used dual encoder model across three downstream tasks, achieving an 8%-20% improvement in accuracy. Our findings underscore the effectiveness of our approach in extracting valuable sequence-level information for user modeling, particularly in scenarios where labeled data is scarce and negative examples are limited.