Knowledge Distillation for Enhancing Walmart E-commerce Search Relevance Using Large Language Models

📅 2025-05-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) face deployment bottlenecks in e-commerce search due to high inference latency, hindering real-time relevance modeling. Method: We propose a knowledge distillation framework tailored for low-latency production environments. It employs soft-label classification fine-tuning for the LLM teacher and introduces a pairwise relevance-margin-based distillation objective (MSE loss). Synthetic training data, labeled by the teacher, substantially expands the student’s training set. Contribution/Results: The distilled lightweight student model deployed on Walmart.com achieves significant gains in search relevance metrics while maintaining end-to-end latency at the millisecond level. Performance improves consistently with increasing synthetic data volume and ultimately surpasses the teacher’s accuracy. This approach effectively reconciles the strong ranking capability of LLMs with stringent real-time inference requirements.

Technology Category

Application Category

📝 Abstract
Ensuring the products displayed in e-commerce search results are relevant to users queries is crucial for improving the user experience. With their advanced semantic understanding, deep learning models have been widely used for relevance matching in search tasks. While large language models (LLMs) offer superior ranking capabilities, it is challenging to deploy LLMs in real-time systems due to the high-latency requirements. To leverage the ranking power of LLMs while meeting the low-latency demands of production systems, we propose a novel framework that distills a high performing LLM into a more efficient, low-latency student model. To help the student model learn more effectively from the teacher model, we first train the teacher LLM as a classification model with soft targets. Then, we train the student model to capture the relevance margin between pairs of products for a given query using mean squared error loss. Instead of using the same training data as the teacher model, we significantly expand the student model dataset by generating unlabeled data and labeling it with the teacher model predictions. Experimental results show that the student model performance continues to improve as the size of the augmented training data increases. In fact, with enough augmented data, the student model can outperform the teacher model. The student model has been successfully deployed in production at Walmart.com with significantly positive metrics.
Problem

Research questions and friction points this paper is trying to address.

Enhancing e-commerce search relevance using large language models
Reducing LLM latency for real-time search systems
Distilling LLM knowledge into efficient student models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Distilling LLM into efficient student model
Expanding dataset with teacher-generated labels
Using MSE loss for relevance margin learning
🔎 Similar Papers
No similar papers found.
Hongwei Shang
Hongwei Shang
Walmart Global Tech
N
Nguyen Vo
Amazon Inc., Sunnyvale, CA, USA
Nitin Yadav
Nitin Yadav
Unknown affiliation
Machine Learning
T
Tian Zhang
Walmart Global Tech, Sunnyvale, CA, USA
Ajit Puthenputhussery
Ajit Puthenputhussery
Data Scientist, Walmart Labs
Sparse CodingDiscriminative Feature Representation
X
Xunfan Cai
Walmart Global Tech, Sunnyvale, CA, USA
S
Shuyi Chen
Walmart Global Tech, Sunnyvale, CA, USA
P
Prijith Chandran
Walmart Global Tech, Sunnyvale, CA, USA
Changsung Kang
Changsung Kang
Sr. Manager of Data Science at Walmart Global Tech
Machine LearningArtificial IntelligenceData MiningInformation Retrieval