TokenFormer: Unify the Multi-Field and Sequential Recommendation Worlds

πŸ“… 2026-04-15
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

194K/year
πŸ€– AI Summary
This work addresses the long-standing disconnect between multi-domain feature interaction and user behavior sequence modeling in existing recommender systems, where naive fusion often leads to sequence feature dimension collapse (SCP). To mitigate this issue, the authors propose TokenFormer, a unified architecture that jointly models multi-domain features and sequential behaviors through a novel Bottom-Full-Top-Sliding hierarchical attention mechanism and nonlinear interaction representations. By integrating hierarchical self-attention, sliding-window attention, and unidirectional nonlinear multiplicative transformations, TokenFormer significantly enhances representation discriminability and dimensional robustness. The method achieves state-of-the-art performance across multiple public benchmarks and Tencent’s advertising platform, demonstrating its effectiveness in alleviating SCP while improving recommendation accuracy.

Technology Category

Application Category

πŸ“ Abstract
Recommender systems have historically developed along two largely independent paradigms: feature interaction models for modeling correlations among multi-field categorical features, and sequential models for capturing user behavior dynamics from historical interaction sequences. Although recent trends attempt to bridge these paradigms within shared backbones, we empirically reveal that naive unifying these two branches may lead to a failure mode of Sequential Collapse Propagation (SCP). That is, the interaction with those dimensionally ill non-sequence fields leads to the dimensional collapse of the sequence features. To overcome this challenge, we propose TokenFormer, a unified recommendation architecture with the following innovations. First, we introduce a Bottom-Full-Top-Sliding (BFTS) attention scheme, which applies full self-attention in the lower layers and shrinking-window sliding attention in the upper layers. Second, we introduce a Non-Linear Interaction Representation (NLIR) that applies one-sided non-linear multiplicative transformations to the hidden states. Extensive experiments on public benchmarks and Tencent's advertising platform demonstrate state-of-the-art performance, while detailed analysis confirm that TokenFormer significantly improves dimensional robustness and representation discriminability under unified modeling.
Problem

Research questions and friction points this paper is trying to address.

Sequential Collapse Propagation
multi-field recommendation
sequential recommendation
dimensional collapse
unified recommendation
Innovation

Methods, ideas, or system contributions that make the work stand out.

TokenFormer
Sequential Collapse Propagation
BFTS attention
Non-Linear Interaction Representation
unified recommendation architecture