HCMRM: A High-Consistency Multimodal Relevance Model for Search Ads

📅 2025-02-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address insufficient semantic matching between queries and videos—leading to suboptimal ranking—in short-video platform search advertising, this paper proposes a high-consistency multimodal relevance modeling framework. Methodologically, it integrates vision–text alignment pretraining with multi-stage relevance optimization. Key contributions include: (1) a novel pseudo-query-guided triplet pretraining paradigm, where semantically controllable pseudo-queries are generated via keyword extraction to strengthen discriminative learning among queries, target videos, and negatives; and (2) a hierarchical Softmax loss function explicitly aligned with ad-ranking objectives to enable fine-grained relevance modeling. Deployed stably in Kuaishou’s search advertising system for over one year, the framework reduces invalid ad impressions by 6.1% and increases ad revenue by 1.4%, demonstrating strong industrial applicability and generalization capability.

Technology Category

Application Category

📝 Abstract
Search advertising is essential for merchants to reach the target users on short video platforms. Short video ads aligned with user search intents are displayed through relevance matching and bid ranking mechanisms. This paper focuses on improving query-to-video relevance matching to enhance the effectiveness of ranking in ad systems. Recent vision-language pre-training models have demonstrated promise in various multimodal tasks. However, their contribution to downstream query-video relevance tasks is limited, as the alignment between the pair of visual signals and text differs from the modeling of the triplet of the query, visual signals, and video text. In addition, our previous relevance model provides limited ranking capabilities, largely due to the discrepancy between the binary cross-entropy fine-tuning objective and the ranking objective. To address these limitations, we design a high-consistency multimodal relevance model (HCMRM). It utilizes a simple yet effective method to enhance the consistency between pre-training and relevance tasks. Specifically, during the pre-training phase, along with aligning visual signals and video text, several keywords are extracted from the video text as pseudo-queries to perform the triplet relevance modeling. For the fine-tuning phase, we introduce a hierarchical softmax loss, which enables the model to learn the order within labels while maximizing the distinction between positive and negative samples. This promotes the fusion ranking of relevance and bidding in the subsequent ranking stage. The proposed method has been deployed in the Kuaishou search advertising system for over a year, contributing to a 6.1% reduction in the proportion of irrelevant ads and a 1.4% increase in ad revenue.
Problem

Research questions and friction points this paper is trying to address.

Improve query-to-video relevance in ads
Enhance multimodal relevance model consistency
Optimize ad ranking with hierarchical softmax
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal relevance model
Hierarchical softmax loss
Pseudo-queries for triplet modeling
🔎 Similar Papers
No similar papers found.
Guobing Gan
Guobing Gan
Unknown affiliation
K
Kaiming Gao
Kuaishou Technology, Beijing, China
L
Li Wang
Kuaishou Technology, Beijing, China
S
Shen Jiang
Kuaishou Technology, Beijing, China
P
Peng Jiang
Kuaishou Technology, Beijing, China