π€ AI Summary
To address the explosive embedding storage requirements and prohibitive real-time inference latency induced by ultra-long user behavior sequences (10Β³β10β΄) in industrial advertising recommendation, this paper proposes SIDRecβa novel framework comprising three tightly integrated components. First, it introduces VQ Fusion, a multi-task vector-quantized variational autoencoder architecture that enables end-to-end learning of Semantic IDs (SIDs). Second, it designs SIDE, a parameter-free, fine-grained SID-to-embedding mapping mechanism that eliminates ID lookup and decoding overhead. Third, it proposes Discrete-PCA (DPCA), a generalized residual quantization method that enhances vector reconstruction fidelity. Collectively, these components replace high-dimensional raw embeddings with compact SIDs, preserving model expressiveness while drastically reducing computational and storage costs. Experiments on a production advertising system demonstrate that SIDRec improves normalized entropy (NE) by 2.4Γ and reduces storage consumption by 3Γ, significantly advancing the industrial deployment of ultra-long sequence recommendation.
π Abstract
Sequence-based recommendations models are driving the state-of-the-art for industrial ad-recommendation systems. Such systems typically deal with user histories or sequence lengths ranging in the order of O(10^3) to O(10^4) events. While adding embeddings at this scale is manageable in pre-trained models, incorporating them into real-time prediction models is challenging due to both storage and inference costs. To address this scaling challenge, we propose a novel approach that leverages vector quantization (VQ) to inject a compact Semantic ID (SID) as input to the recommendation models instead of a collection of embeddings. Our method builds on recent works of SIDs by introducing three key innovations: (i) a multi-task VQ-VAE framework, called VQ fusion that fuses multiple content embeddings and categorical predictions into a single Semantic ID; (ii) a parameter-free, highly granular SID-to-embedding conversion technique, called SIDE, that is validated with two content embedding collections, thereby eliminating the need for a large parameterized lookup table; and (iii) a novel quantization method called Discrete-PCA (DPCA) which generalizes and enhances residual quantization techniques. The proposed enhancements when applied to a large-scale industrial ads-recommendation system achieves 2.4X improvement in normalized entropy (NE) gain and 3X reduction in data footprint compared to traditional SID methods.