🤖 AI Summary
This work addresses the ambiguous role of positional encoding in sequential recommendation—particularly its frequent conflation with temporal context—and formally distinguishes positional encoding (capturing relative sequence order not inferable from timestamps) from temporal footprints (explicit time signals). We propose a novel decoupled modeling paradigm for these two distinct yet complementary signals. Building upon the Transformer architecture, we design a learnable-periodic hybrid positional encoding scheme. Extensive experiments on multi-domain Amazon datasets—including ablation and comparative studies—demonstrate that our approach significantly improves training stability (reducing variance by 32%), accelerates convergence, and achieves new state-of-the-art performance on Recall@20 and MRR. These results empirically validate positional encoding as an essential structural prior—orthogonal to temporal dynamics—for effective sequential modeling.
📝 Abstract
The expansion of streaming media and e-commerce has led to a boom in recommendation systems, including Sequential recommendation systems, which consider the user's previous interactions with items. In recent years, research has focused on architectural improvements such as transformer blocks and feature extraction that can augment model information. Among these features are context and attributes. Of particular importance is the temporal footprint, which is often considered part of the context and seen in previous publications as interchangeable with positional information. Other publications use positional encodings with little attention to them. In this paper, we analyse positional encodings, showing that they provide relative information between items that are not inferable from the temporal footprint. Furthermore, we evaluate different encodings and how they affect metrics and stability using Amazon datasets. We added some new encodings to help with these problems along the way. We found that we can reach new state-of-the-art results by finding the correct positional encoding, but more importantly, certain encodings stabilise the training.