SIDE: Semantic ID Embedding for effective learning from sequences

📅 2025-06-20

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

To address the explosive embedding storage requirements and prohibitive real-time inference latency induced by ultra-long user behavior sequences (10³–10⁴) in industrial advertising recommendation, this paper proposes SIDRec—a novel framework comprising three tightly integrated components. First, it introduces VQ Fusion, a multi-task vector-quantized variational autoencoder architecture that enables end-to-end learning of Semantic IDs (SIDs). Second, it designs SIDE, a parameter-free, fine-grained SID-to-embedding mapping mechanism that eliminates ID lookup and decoding overhead. Third, it proposes Discrete-PCA (DPCA), a generalized residual quantization method that enhances vector reconstruction fidelity. Collectively, these components replace high-dimensional raw embeddings with compact SIDs, preserving model expressiveness while drastically reducing computational and storage costs. Experiments on a production advertising system demonstrate that SIDRec improves normalized entropy (NE) by 2.4× and reduces storage consumption by 3×, significantly advancing the industrial deployment of ultra-long sequence recommendation.

Technology Category

Application Category

📝 Abstract

Sequence-based recommendations models are driving the state-of-the-art for industrial ad-recommendation systems. Such systems typically deal with user histories or sequence lengths ranging in the order of O(10^3) to O(10^4) events. While adding embeddings at this scale is manageable in pre-trained models, incorporating them into real-time prediction models is challenging due to both storage and inference costs. To address this scaling challenge, we propose a novel approach that leverages vector quantization (VQ) to inject a compact Semantic ID (SID) as input to the recommendation models instead of a collection of embeddings. Our method builds on recent works of SIDs by introducing three key innovations: (i) a multi-task VQ-VAE framework, called VQ fusion that fuses multiple content embeddings and categorical predictions into a single Semantic ID; (ii) a parameter-free, highly granular SID-to-embedding conversion technique, called SIDE, that is validated with two content embedding collections, thereby eliminating the need for a large parameterized lookup table; and (iii) a novel quantization method called Discrete-PCA (DPCA) which generalizes and enhances residual quantization techniques. The proposed enhancements when applied to a large-scale industrial ads-recommendation system achieves 2.4X improvement in normalized entropy (NE) gain and 3X reduction in data footprint compared to traditional SID methods.

Problem

Research questions and friction points this paper is trying to address.

Scale challenges in real-time ad-recommendation models

High storage and inference costs in sequence embeddings

Need for compact Semantic IDs in recommendation systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-task VQ-VAE framework for Semantic ID fusion

Parameter-free SID-to-embedding conversion technique

Discrete-PCA quantization enhancing residual methods

🔎 Similar Papers

STORE: Streamlining Semantic Tokenization and Generative Recommendation with A Single LLM