🤖 AI Summary
This paper addresses key challenges in deploying Transformer-based ranking models at scale in industrial settings—namely, high engineering overhead, heavy reliance on handcrafted features, and the trade-off between recommendation diversity and accuracy. To this end, we propose LiGR, LinkedIn’s proprietary ranking framework. Its core contributions are threefold: (1) a set-wise joint attention mechanism that enables end-to-end scoring of candidate items conditioned on user history, achieving automatic diversity optimization for the first time in production-scale ranking; (2) learnable normalization and single-pass user history encoding, eliminating hundreds of engineered features and attaining superior performance using only raw behavioral signals; and (3) empirical validation of scaling laws for ranking models, demonstrating consistent gains with larger architectures, longer contexts, and increased training data. Online A/B tests show statistically significant improvements in both click-through rate and recommendation diversity, and LiGR has been deployed across LinkedIn’s full-fledged recommendation system.
📝 Abstract
We present LiGR, a large-scale ranking framework developed at LinkedIn that brings state-of-the-art transformer-based modeling architectures into production. We introduce a modified transformer architecture that incorporates learned normalization and simultaneous set-wise attention to user history and ranked items. This architecture enables several breakthrough achievements, including: (1) the deprecation of most manually designed feature engineering, outperforming the prior state-of-the-art system using only few features (compared to hundreds in the baseline), (2) validation of the scaling law for ranking systems, showing improved performance with larger models, more training data, and longer context sequences, and (3) simultaneous joint scoring of items in a set-wise manner, leading to automated improvements in diversity. To enable efficient serving of large ranking models, we describe techniques to scale inference effectively using single-pass processing of user history and set-wise attention. We also summarize key insights from various ablation studies and A/B tests, highlighting the most impactful technical approaches.