🤖 AI Summary
In industrial-scale CTR prediction, conventional DNNs suffer from limited capability in modeling heterogeneous features (e.g., user profiles, item attributes, behavioral sequences). To address this, we propose HiFormer—a Hierarchical Heterogeneous Feature Transformer architecture. Its key contributions are: (1) a semantics-driven feature chunking mechanism with chunk-specific QKV projections, preventing semantic interference across feature types; and (2) synergistic modeling via heterogeneous Transformer encoders and lightweight HiFormer layers, explicitly capturing high-order cross-type interactions. HiFormer departs from traditional homogeneous modeling paradigms, enabling fine-grained semantic awareness and efficient cross-domain feature fusion. Deployed in Taobao’s production environment, HiFormer achieves a +0.4% improvement in CTR AUC and a +0.6% increase in GMV, delivering significant gains in core business metrics.
📝 Abstract
We propose HHFT (Hierarchical Heterogeneous Feature Transformer), a Transformer-based architecture tailored for industrial CTR prediction. HHFT addresses the limitations of DNN through three key designs: (1) Semantic Feature Partitioning: Grouping heterogeneous features (e.g. user profile, item information, behaviour sequennce) into semantically coherent blocks to preserve domain-specific information; (2) Heterogeneous Transformer Encoder: Adopting block-specific QKV projections and FFNs to avoid semantic confusion between distinct feature types; (3) Hiformer Layer: Capturing high-order interactions across features. Our findings reveal that Transformers significantly outperform DNN baselines, achieving a +0.4% improvement in CTR AUC at scale. We have successfully deployed the model on Taobao's production platform, observing a significant uplift in key business metrics, including a +0.6% increase in Gross Merchandise Value (GMV).