MERGE: Next-Generation Item Indexing Paradigm for Large-Scale Streaming Recommendation

📅 2026-01-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing vector quantization–based item indexing methods struggle to handle the highly skewed and non-stationary item distributions prevalent in streaming recommendation systems, resulting in low assignment accuracy, imbalanced clustering, and insufficient inter-cluster separation. To address these challenges, this work proposes MERGE, an adaptive hierarchical item indexing paradigm that dynamically constructs clusters from scratch, continuously monitors cluster occupancy in real time, and employs a fine-to-coarse merging strategy to build a hierarchical structure. MERGE substantially improves assignment accuracy, cluster uniformity, and inter-cluster separation. Online A/B tests further demonstrate its significant gains on key business metrics, validating its practical effectiveness in real-world streaming recommendation scenarios.

Technology Category

Application Category

📝 Abstract
Item indexing, which maps a large corpus of items into compact discrete representations, is critical for both discriminative and generative recommender systems, yet existing Vector Quantization (VQ)-based approaches struggle with the highly skewed and non-stationary item distributions common in streaming industry recommenders, leading to poor assignment accuracy, imbalanced cluster occupancy, and insufficient cluster separation. To address these challenges, we propose MERGE, a next-generation item indexing paradigm that adaptively constructs clusters from scratch, dynamically monitors cluster occupancy, and forms hierarchical index structures via fine-to-coarse merging. Extensive experiments demonstrate that MERGE significantly improves assignment accuracy, cluster uniformity, and cluster separation compared with existing indexing methods, while online A/B tests show substantial gains in key business metrics, highlighting its potential as a foundational indexing approach for large-scale recommendation.
Problem

Research questions and friction points this paper is trying to address.

item indexing
streaming recommendation
vector quantization
non-stationary distribution
cluster imbalance
Innovation

Methods, ideas, or system contributions that make the work stand out.

item indexing
streaming recommendation
vector quantization
adaptive clustering
hierarchical indexing
🔎 Similar Papers
J
Jing Yan
Bytedance, Singapore, Singapore
Yimeng Bai
Yimeng Bai
University of Science and Technology of China
RecommendationGenerative RecommendationLarge Language Model
Z
Zongyu Liu
Bytedance, Singapore, Singapore
Y
Yahui Liu
Bytedance, Shanghai, China
J
Junwei Wang
Bytedance, Shanghai, China
J
Jingze Huang
Bytedance, Shanghai, China
H
Haoda Li
Bytedance, Singapore, Singapore
Sihao Ding
Sihao Ding
Mercedes-Benz R&D North America
Computer VisionMachine Learning
S
Shaohui Ruan
Bytedance, Shanghai, China
Yang Zhang
Yang Zhang
National University of Singapore
RecommendationLLM PersonalizationTrustworthy