🤖 AI Summary
This work addresses the challenge of jointly modeling long user behavior sequences and heterogeneous non-sequential features under stringent efficiency constraints in industrial-scale recommender systems. To this end, we propose HyFormer, a unified hybrid Transformer architecture that, for the first time, integrates sequence modeling and feature interaction within a single backbone network. The core innovation lies in an alternating optimization mechanism comprising Query Decoding—leveraging hierarchical key-value representations for efficient long-sequence decoding—and Query Boosting, which enables cross-query and cross-sequence token mixing for dynamic semantic enhancement. Evaluated on billion-scale industrial datasets, HyFormer significantly outperforms state-of-the-art methods LONGER and RankMixer under identical parameter and FLOPs budgets, with online A/B tests confirming substantial performance gains in high-traffic deployment scenarios.
📝 Abstract
Industrial large-scale recommendation models (LRMs) face the challenge of jointly modeling long-range user behavior sequences and heterogeneous non-sequential features under strict efficiency constraints. However, most existing architectures employ a decoupled pipeline: long sequences are first compressed with a query-token based sequence compressor like LONGER, followed by fusion with dense features through token-mixing modules like RankMixer, which thereby limits both the representation capacity and the interaction flexibility. This paper presents HyFormer, a unified hybrid transformer architecture that tightly integrates long-sequence modeling and feature interaction into a single backbone. From the perspective of sequence modeling, we revisit and redesign query tokens in LRMs, and frame the LRM modeling task as an alternating optimization process that integrates two core components: Query Decoding which expands non-sequential features into Global Tokens and performs long sequence decoding over layer-wise key-value representations of long behavioral sequences; and Query Boosting which enhances cross-query and cross-sequence heterogeneous interactions via efficient token mixing. The two complementary mechanisms are performed iteratively to refine semantic representations across layers. Extensive experiments on billion-scale industrial datasets demonstrate that HyFormer consistently outperforms strong LONGER and RankMixer baselines under comparable parameter and FLOPs budgets, while exhibiting superior scaling behavior with increasing parameters and FLOPs. Large-scale online A/B tests in high-traffic production systems further validate its effectiveness, showing significant gains over deployed state-of-the-art models. These results highlight the practicality and scalability of HyFormer as a unified modeling framework for industrial LRMs.