🤖 AI Summary
To address challenges in modeling ultra-long user behavior sequences in industrial recommendation systems—including difficulty in jointly capturing long- and short-term preferences, inconsistency between upstream and downstream modules, and low computational efficiency—this paper proposes an end-to-end, GPU-optimized long-sequence Transformer architecture. Key contributions include: (1) a global token mechanism enabling stable long-range attention; (2) hierarchical token compression via a lightweight InnerTransformer coupled with hybrid attention; and (3) a fully synchronized GPU training and inference framework supporting unified dense/sparse parameter updates. The method integrates mixed-precision training, activation recomputation, and KV cache optimization. Evaluated on ByteDance’s advertising and e-commerce platforms, it achieves significant offline metric improvements and yields an average +2.1% CTR gain in online A/B tests. The system has been deployed across 10+ core business services, serving over one billion users.
📝 Abstract
Modeling ultra-long user behavior sequences is critical for capturing both long- and short-term preferences in industrial recommender systems. Existing solutions typically rely on two-stage retrieval or indirect modeling paradigms, incuring upstream-downstream inconsistency and computational inefficiency. In this paper, we present LONGER, a Long-sequence Optimized traNsformer for GPU-Efficient Recommenders. LONGER incorporates (i) a global token mechanism for stabilizing attention over long contexts, (ii) a token merge module with lightweight InnerTransformers and hybrid attention strategy to reduce quadratic complexity, and (iii) a series of engineering optimizations, including training with mixed-precision and activation recomputation, KV cache serving, and the fully synchronous model training and serving framework for unified GPU-based dense and sparse parameter updates. LONGER consistently outperforms strong baselines in both offline metrics and online A/B testing in both advertising and e-commerce services at ByteDance, validating its consistent effectiveness and industrial-level scaling laws. Currently, LONGER has been fully deployed at more than 10 influential scenarios at ByteDance, serving billion users.