🤖 AI Summary
To address the challenge of balancing real-time responsiveness and retrieval accuracy in large-scale recommender systems, this paper proposes Streaming Vector Quantization (Streaming VQ), a novel index structure. Streaming VQ introduces the first lightweight indexing paradigm supporting real-time incremental construction, dynamic codebook updates, and online clustering evolution—while inherently enabling automatic load balancing and fault recovery. Crucially, it is the first lightweight index to seamlessly integrate with complex ranking models. Compared to conventional static indices, Streaming VQ achieves high-precision approximate nearest neighbor retrieval under sub-10 ms latency, with index update latency below 1 second. Deployed at scale across all core retrieval scenarios in TikTok and TikTok Lite, it has demonstrably improved key user engagement metrics—including click-through rate and session duration.
📝 Abstract
Retrievers, which form one of the most important recommendation stages, are responsible for efficiently selecting possible positive samples to the later stages under strict latency limitations. Because of this, large-scale systems always rely on approximate calculations and indexes to roughly shrink candidate scale, with a simple ranking model. Considering simple models lack the ability to produce precise predictions, most of the existing methods mainly focus on incorporating complicated ranking models. However, another fundamental problem of index effectiveness remains unresolved, which also bottlenecks complication. In this paper, we propose a novel index structure: streaming Vector Quantization model, as a new generation of retrieval paradigm. Streaming VQ attaches items with indexes in real time, granting it immediacy. Moreover, through meticulous verification of possible variants, it achieves additional benefits like index balancing and reparability, enabling it to support complicated ranking models as existing approaches. As a lightweight and implementation-friendly architecture, streaming VQ has been deployed and replaced all major retrievers in Douyin and Douyin Lite, resulting in remarkable user engagement gain.