Semantic Ads Retrieval at Walmart eCommerce with Language Models Progressively Trained on Multiple Knowledge Domains

πŸ“… 2025-02-13
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
E-commerce search advertising faces core challenges including linguistic asymmetry between queries and product titles, ambiguous user intent, and imbalanced, sparse training corpora. Method: This paper develops an end-to-end semantic retrieval system for Walmart, introducing a human-feedback-driven, progressive multi-domain knowledge fusion training paradigm. It pioneers the joint optimization of category-aware BERT pretraining and a dual-tower Siamese architecture. Contribution/Results: The approach achieves significant gains in semantic matching accuracy while maintaining high engineering efficiency. Compared to baseline models, search relevance improves by 16%; large-scale online A/B tests confirm substantial increases in advertising revenue. The system comprehensively outperforms Walmart’s current production system across all key metrics, demonstrating robust generalization and operational scalability.

Technology Category

Application Category

πŸ“ Abstract
Sponsored search in e-commerce poses several unique and complex challenges. These challenges stem from factors such as the asymmetric language structure between search queries and product names, the inherent ambiguity in user search intent, and the vast volume of sparse and imbalanced search corpus data. The role of the retrieval component within a sponsored search system is pivotal, serving as the initial step that directly affects the subsequent ranking and bidding systems. In this paper, we present an end-to-end solution tailored to optimize the ads retrieval system on Walmart.com. Our approach is to pretrain the BERT-like classification model with product category information, enhancing the model's understanding of Walmart product semantics. Second, we design a two-tower Siamese Network structure for embedding structures to augment training efficiency. Third, we introduce a Human-in-the-loop Progressive Fusion Training method to ensure robust model performance. Our results demonstrate the effectiveness of this pipeline. It enhances the search relevance metric by up to 16% compared to a baseline DSSM-based model. Moreover, our large-scale online A/B testing demonstrates that our approach surpasses the ad revenue of the existing production model.
Problem

Research questions and friction points this paper is trying to address.

Optimize semantic ads retrieval in e-commerce
Handle asymmetric language and ambiguous intent
Improve search relevance and ad revenue
Innovation

Methods, ideas, or system contributions that make the work stand out.

BERT-like model pretraining
Siamese Network structure
Human-in-the-loop training
πŸ”Ž Similar Papers
No similar papers found.
Zhaodong Wang
Zhaodong Wang
Research Scientist, Facebook
operations researchtransportationpartial differential equationsoptimization theory
W
Weizhi Du
Walmart Global Tech, Sunnyvale, California, USA
Md Omar Faruk Rokon
Md Omar Faruk Rokon
Sr Data Scientist, Walmart Global Tech
Search RelevanceNatural Language ProcessingData MiningInformation Retrieval
P
Pooshpendu Adhikary
Walmart Global Tech, Sunnyvale, California, USA
Y
Yanbing Xue
Walmart Global Tech, Sunnyvale, California, USA
J
Jiaxuan Xu
Walmart Global Tech, Sunnyvale, California, USA
Jianghong Zhou
Jianghong Zhou
Walmart Global Tech, Sunnyvale, California, USA
Kuang-chih Lee
Kuang-chih Lee
Director, Data Science, Alibaba Inc.
Machine Learning
M
Musen Wen
Walmart Global Tech, Sunnyvale, California, USA