Unbiased Multimodal Reranking for Long-Tail Short-Video Search

📅 2026-03-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of sparse user interactions in long-tail short-video search, which often leads retrieval models to favor low-quality content such as clickbait. To mitigate this bias, the authors propose a multimodal reranking framework powered by large language models (LLMs). The approach employs a two-stage training paradigm to construct an unbiased user experience estimator: first, LLMs synthesize high-quality supervision signals by integrating multimodal evidence for model fine-tuning; second, pairwise preference optimization learns partial orderings among videos, followed by reinforcement learning at inference time for page-level optimization. By incorporating LLMs’ world knowledge into long-tail search reranking—novel in this context—the method significantly reduces reliance on sparse user feedback and improves the ranking of high-quality yet underexposed videos. Both offline metrics (AUC, NDCG@K, human preference) and online A/B tests (15% traffic) demonstrate substantial gains over strong baselines in user satisfaction and engagement.

Technology Category

Application Category

📝 Abstract
Kuaishou serving hundreds of millions of searches daily, the quality of short-video search is paramount. However, it suffers from a severe Matthew effect on long-tail queries: sparse user behavior data causes models to amplify low-quality content such as clickbait and shallow content. The recent advancements in Large Language Models (LLMs) offer a new paradigm, as their inherent world knowledge provides a powerful mechanism to assess content quality, agnostic to sparse user interactions. To this end, we propose a LLM-driven multimodal reranking framework, which estimates user experience without real user behavior. The approach involves a two-stage training process: the first stage uses multimodal evidence to construct high-quality annotations for supervised fine-tuning, while the second stage incorporates pairwise preference optimization to help the model learn partial orderings among candidates. At inference time, the resulting experience scores are used to promote high-quality but underexposed videos in reranking, and further guide page-level optimization through reinforcement learning. Experiments show that the proposed method achieves consistent improvements over strong baselines in offline metrics including AUC, NDCG@K, and human preference judgement. An online A/B test covering 15\% of traffic further demonstrates gains in both user experience and consumption metrics, confirming the practical value of the approach in long-tail video search scenarios.
Problem

Research questions and friction points this paper is trying to address.

long-tail search
short-video search
Matthew effect
content quality
user behavior sparsity
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-driven reranking
multimodal evidence
long-tail search
preference optimization
reinforcement learning
🔎 Similar Papers
W
Wenyi Xu
Zhejiang University; Kuaishou Technology
F
Feiran Zhu
Kuaishou Technology
S
Songyang Li
Kuaishou Technology
R
Renzhe Zhou
Kuaishou Technology
Chao Zhang
Chao Zhang
Alibaba
C
Chenglei Dai
Kuaishou Technology
Y
Yuren Mao
Zhejiang University
Yunjun Gao
Yunjun Gao
Professor of Computer Science, Zhejiang University
DatabaseBig Data Management and Analyticsand AI Interaction with DB Technology
Yi Zhang
Yi Zhang
Huawei Co., Ltd
CVAITrustworthy AI