🤖 AI Summary
This work addresses the challenge of sparse user interactions in long-tail short-video search, which often leads retrieval models to favor low-quality content such as clickbait. To mitigate this bias, the authors propose a multimodal reranking framework powered by large language models (LLMs). The approach employs a two-stage training paradigm to construct an unbiased user experience estimator: first, LLMs synthesize high-quality supervision signals by integrating multimodal evidence for model fine-tuning; second, pairwise preference optimization learns partial orderings among videos, followed by reinforcement learning at inference time for page-level optimization. By incorporating LLMs’ world knowledge into long-tail search reranking—novel in this context—the method significantly reduces reliance on sparse user feedback and improves the ranking of high-quality yet underexposed videos. Both offline metrics (AUC, NDCG@K, human preference) and online A/B tests (15% traffic) demonstrate substantial gains over strong baselines in user satisfaction and engagement.
📝 Abstract
Kuaishou serving hundreds of millions of searches daily, the quality of short-video search is paramount. However, it suffers from a severe Matthew effect on long-tail queries: sparse user behavior data causes models to amplify low-quality content such as clickbait and shallow content. The recent advancements in Large Language Models (LLMs) offer a new paradigm, as their inherent world knowledge provides a powerful mechanism to assess content quality, agnostic to sparse user interactions. To this end, we propose a LLM-driven multimodal reranking framework, which estimates user experience without real user behavior. The approach involves a two-stage training process: the first stage uses multimodal evidence to construct high-quality annotations for supervised fine-tuning, while the second stage incorporates pairwise preference optimization to help the model learn partial orderings among candidates. At inference time, the resulting experience scores are used to promote high-quality but underexposed videos in reranking, and further guide page-level optimization through reinforcement learning. Experiments show that the proposed method achieves consistent improvements over strong baselines in offline metrics including AUC, NDCG@K, and human preference judgement. An online A/B test covering 15\% of traffic further demonstrates gains in both user experience and consumption metrics, confirming the practical value of the approach in long-tail video search scenarios.