UniDex: Rethinking Search Inverted Indexing with Unified Semantic Modeling

📅 2025-09-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional inverted indexes rely on exact term matching, resulting in limited generalization capability and a pronounced semantic gap. To address this, we propose UniDex—a novel framework that elevates inverted indexing from token-level matching to end-to-end semantic modeling for the first time. UniDex employs deep encoding and contrastive learning to generate semantically grounded identifiers and constructs a semantic inverted index. It further introduces two tightly integrated modules: UniTouch for semantic retrieval and UniRank for semantic re-ranking, enabling joint optimization of recall and ranking. This design significantly reduces reliance on hand-crafted rules and maintenance overhead. Evaluated on Kuaishou’s large-scale short-video search system under real production traffic, UniDex achieves substantial improvements in retrieval accuracy and zero-shot generalization, serving hundreds of millions of active users.

Technology Category

Application Category

📝 Abstract
Inverted indexing has traditionally been a cornerstone of modern search systems, leveraging exact term matches to determine relevance between queries and documents. However, this term-based approach often emphasizes surface-level token overlap, limiting the system's generalization capabilities and retrieval effectiveness. To address these challenges, we propose UniDex, a novel model-based method that employs unified semantic modeling to revolutionize inverted indexing. UniDex replaces complex manual designs with a streamlined architecture, enhancing semantic generalization while reducing maintenance overhead. Our approach involves two key components: UniTouch, which maps queries and documents into semantic IDs for improved retrieval, and UniRank, which employs semantic matching to rank results effectively. Through large-scale industrial datasets and real-world online traffic assessments, we demonstrate that UniDex significantly improves retrieval capabilities, marking a paradigm shift from term-based to model-based indexing. Our deployment within Kuaishou's short-video search systems further validates UniDex's practical effectiveness, serving hundreds of millions of active users efficiently.
Problem

Research questions and friction points this paper is trying to address.

Replacing term-based indexing with unified semantic modeling
Enhancing semantic generalization while reducing maintenance overhead
Improving retrieval effectiveness through model-based inverted indexing
Innovation

Methods, ideas, or system contributions that make the work stand out.

UniDex replaces manual designs with streamlined semantic architecture
UniTouch maps queries and documents into semantic IDs
UniRank employs semantic matching to rank results effectively
🔎 Similar Papers
No similar papers found.
Zan Li
Zan Li
xidian university
Covert CommunicationsSignal Processing
J
Jiahui Chen
Kuaishou Technology, Beijing, China
Y
Yuan Chai
Kuaishou Technology, Beijing, China
X
Xiaoze Jiang
Kuaishou Technology, Beijing, China
X
Xiaohua Qi
Kuaishou Technology, Beijing, China
Z
Zhiheng Qin
Kuaishou Technology, Beijing, China
R
Runbin Zhou
Kuaishou Technology, Beijing, China
S
Shun Zuo
Kuaishou Technology, Beijing, China
G
Guangchao Hao
Kuaishou Technology, Beijing, China
K
Kefeng Wang
Kuaishou Technology, Beijing, China
J
Jingshan Lv
Kuaishou Technology, Beijing, China
Y
Yupeng Huang
Kuaishou Technology, Beijing, China
X
Xiao Liang
Kuaishou Technology, Beijing, China
H
Han Li
Kuaishou Technology, Beijing, China