CSRM-LLM: Embracing Multilingual LLMs for Cold-Start Relevance Matching in Emerging E-commerce Markets

📅 2025-09-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the cold-start relevance matching challenge in emerging e-commerce markets—characterized by scarce tag and user behavioral data—this paper proposes the Cross-lingual Semantic Relevance Matching (CSRM) framework. First, it leverages machine translation as a pretraining task to activate cross-lingual transfer capabilities of multilingual large language models. Second, it incorporates a retrieval-augmented query understanding module to enable semantic-driven query expansion. Third, it introduces a multi-round self-distillation training strategy to mitigate annotation noise and enhance generalization under low-resource conditions. CSRM operates without human annotations and significantly reduces reliance on historical data from the target market. Online deployment results demonstrate a 45.8% reduction in system defect rate and a 0.866-percentage-point increase in session purchase rate, substantially improving search and recommendation quality in cold-start scenarios.

Technology Category

Application Category

📝 Abstract
As global e-commerce platforms continue to expand, companies are entering new markets where they encounter cold-start challenges due to limited human labels and user behaviors. In this paper, we share our experiences in Coupang to provide a competitive cold-start performance of relevance matching for emerging e-commerce markets. Specifically, we present a Cold-Start Relevance Matching (CSRM) framework, utilizing a multilingual Large Language Model (LLM) to address three challenges: (1) activating cross-lingual transfer learning abilities of LLMs through machine translation tasks; (2) enhancing query understanding and incorporating e-commerce knowledge by retrieval-based query augmentation; (3) mitigating the impact of training label errors through a multi-round self-distillation training strategy. Our experiments demonstrate the effectiveness of CSRM-LLM and the proposed techniques, resulting in successful real-world deployment and significant online gains, with a 45.8% reduction in defect ratio and a 0.866% uplift in session purchase rate.
Problem

Research questions and friction points this paper is trying to address.

Addressing cold-start relevance matching in new e-commerce markets
Utilizing multilingual LLMs for cross-lingual transfer learning
Mitigating training label errors through self-distillation strategy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilingual LLM for cross-lingual transfer learning
Retrieval-based query augmentation for e-commerce
Multi-round self-distillation training strategy
🔎 Similar Papers
No similar papers found.
Y
Yujing Wang
Coupang, Inc., Beijing, China
Y
Yiren Chen
Coupang, Inc., Beijing, China
Huoran Li
Huoran Li
Coupang, Inc., Beijing, China
C
Chunxu Xu
Coupang, Inc., Beijing, China
Y
Yuchong Luo
Coupang, Inc., Beijing, China
Xianghui Mao
Xianghui Mao
Coupang, Inc., Beijing, China
C
Cong Li
Coupang, Inc., Beijing, China
L
Lun Du
Coupang, Inc., Beijing, China
C
Chunyang Ma
Coupang, Inc., Beijing, China
Qiqi Jiang
Qiqi Jiang
Coupang, Inc., Beijing, China
Y
Yin Wang
Coupang, Inc., Beijing, China
Fan Gao
Fan Gao
Caltech; MIT
NGS BioinformaticsImage data processingAI/MLNeurodegenerationProtein Bioinformatics
W
Wenting Mo
Coupang, Inc., Beijing, China
P
Pei Wen
Coupang, Inc., Beijing, China
Shantanu Kumar
Shantanu Kumar
Coupang, Inc., Seoul, Republic of Korea
Taejin Park
Taejin Park
NVIDIA
Speech Signal ProcessingAudio Signal ProcessingMachine Learning
Y
Yiwei Song
Coupang, Inc., Mountain View, United States
V
Vijay Rajaram
Coupang, Inc., Mountain View, United States
Tao Cheng
Tao Cheng
Professor in GeoInformatics, University College London
Geographical Information ScienceSpace-Time AnalyticsSmart CitiesGeoComputationNetwork Complexity
S
Sonu Durgia
Coupang, Inc., Mountain View, United States
Pranam Kolari
Pranam Kolari
Coupang, Inc., Mountain View, United States