HyTRec: A Hybrid Temporal-Aware Attention Architecture for Long Behavior Sequential Recommendation

📅 2026-02-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of efficiently and accurately modeling ultra-long user behavior sequences in recommendation systems, where existing approaches face a trade-off between computational efficiency and modeling precision: linear attention suffers from limited capacity leading to inaccurate retrieval, while softmax-based attention incurs prohibitive computational costs. To overcome this, the authors propose a hybrid attention architecture that decouples long-term stable preferences from short-term interest dynamics, employing linear attention over historical interactions and softmax attention on recent behaviors. A Time-Aware Dynamic Network (TADN) is further introduced to dynamically amplify fresh signals and suppress historical noise. The resulting model maintains linear inference complexity and achieves a Hit Rate improvement of over 8% on industrial-scale ultra-long sequence recommendation tasks, significantly outperforming strong baselines.

Technology Category

Application Category

📝 Abstract
Modeling long sequences of user behaviors has emerged as a critical frontier in generative recommendation. However, existing solutions face a dilemma: linear attention mechanisms achieve efficiency at the cost of retrieval precision due to limited state capacity, while softmax attention suffers from prohibitive computational overhead. To address this challenge, we propose HyTRec, a model featuring a Hybrid Attention architecture that explicitly decouples long-term stable preferences from short-term intent spikes. By assigning massive historical sequences to a linear attention branch and reserving a specialized softmax attention branch for recent interactions, our approach restores precise retrieval capabilities within industrial-scale contexts involving ten thousand interactions. To mitigate the lag in capturing rapid interest drifts within the linear layers, we furthermore design Temporal-Aware Delta Network (TADN) to dynamically upweight fresh behavioral signals while effectively suppressing historical noise. Empirical results on industrial-scale datasets confirm the superiority that our model maintains linear inference speed and outperforms strong baselines, notably delivering over 8% improvement in Hit Rate for users with ultra-long sequences with great efficiency.
Problem

Research questions and friction points this paper is trying to address.

long behavior sequence
sequential recommendation
attention mechanism
computational efficiency
retrieval precision
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid Attention
Temporal-Aware Delta Network
Long Sequential Recommendation
Linear Attention
Softmax Attention
🔎 Similar Papers
No similar papers found.