🤖 AI Summary
Traditional inverted indexes rely on exact term matching, resulting in limited generalization capability and a pronounced semantic gap. To address this, we propose UniDex—a novel framework that elevates inverted indexing from token-level matching to end-to-end semantic modeling for the first time. UniDex employs deep encoding and contrastive learning to generate semantically grounded identifiers and constructs a semantic inverted index. It further introduces two tightly integrated modules: UniTouch for semantic retrieval and UniRank for semantic re-ranking, enabling joint optimization of recall and ranking. This design significantly reduces reliance on hand-crafted rules and maintenance overhead. Evaluated on Kuaishou’s large-scale short-video search system under real production traffic, UniDex achieves substantial improvements in retrieval accuracy and zero-shot generalization, serving hundreds of millions of active users.
📝 Abstract
Inverted indexing has traditionally been a cornerstone of modern search systems, leveraging exact term matches to determine relevance between queries and documents. However, this term-based approach often emphasizes surface-level token overlap, limiting the system's generalization capabilities and retrieval effectiveness. To address these challenges, we propose UniDex, a novel model-based method that employs unified semantic modeling to revolutionize inverted indexing. UniDex replaces complex manual designs with a streamlined architecture, enhancing semantic generalization while reducing maintenance overhead. Our approach involves two key components: UniTouch, which maps queries and documents into semantic IDs for improved retrieval, and UniRank, which employs semantic matching to rank results effectively. Through large-scale industrial datasets and real-world online traffic assessments, we demonstrate that UniDex significantly improves retrieval capabilities, marking a paradigm shift from term-based to model-based indexing. Our deployment within Kuaishou's short-video search systems further validates UniDex's practical effectiveness, serving hundreds of millions of active users efficiently.