NEMO-4-PAYPAL: Leveraging NVIDIA's Nemo Framework for empowering PayPal's Commerce Agent

📅 2025-12-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the critical performance bottleneck in PayPal’s business agent—excessive retrieval-component latency (>50% of end-to-end latency)—this work designs an e-commerce–oriented production-grade multi-agent system. It pioneers the application of NVIDIA NeMo to retrieval optimization, leveraging the Nemotron-8B small language model and introducing a retrieval-specific LoRA fine-tuning strategy, integrated with AdamW optimization, learning-rate scanning, and cosine annealing scheduling. Experiments demonstrate >50% reduction in retrieval latency, substantial decreases in overall inference latency and computational cost, while maintaining or improving task accuracy and user experience. Key contributions include: (1) the first industrial deployment of NeMo in an e-commerce multi-agent system; (2) a retrieval-oriented, lightweight LLM fine-tuning paradigm; and (3) a scalable, low-latency, cost-efficient, and high-quality commercial agent architecture.

Technology Category

Application Category

📝 Abstract
We present the development and optimization of PayPal's Commerce Agent, powered by NEMO-4-PAYPAL, a multi-agent system designed to revolutionize agentic commerce on the PayPal platform. Through our strategic partnership with NVIDIA, we leveraged the NeMo Framework for LLM model fine-tuning to enhance agent performance. Specifically, we optimized the Search and Discovery agent by replacing our base model with a fine-tuned Nemotron small language model (SLM). We conducted comprehensive experiments using the llama3.1-nemotron-nano-8B-v1 architecture, training LoRA-based models through systematic hyperparameter sweeps across learning rates, optimizers (Adam, AdamW), cosine annealing schedules, and LoRA ranks. Our contributions include: (1) the first application of NVIDIA's NeMo Framework to commerce-specific agent optimization, (2) LLM powered fine-tuning strategy for retrieval-focused commerce tasks, (3) demonstration of significant improvements in latency and cost while maintaining agent quality, and (4) a scalable framework for multi-agent system optimization in production e-commerce environments. Our results demonstrate that the fine-tuned Nemotron SLM effectively resolves the key performance issue in the retrieval component, which represents over 50% of total agent response time, while maintaining or enhancing overall system performance.
Problem

Research questions and friction points this paper is trying to address.

Optimizing PayPal's commerce agent performance using NVIDIA's NeMo framework
Enhancing search and discovery agent through fine-tuned language models
Reducing latency and cost while maintaining agent quality in e-commerce
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuned Nemotron SLM for commerce agent optimization
Used NeMo Framework for LLM fine-tuning and hyperparameter sweeps
Scalable multi-agent framework for e-commerce performance improvement
🔎 Similar Papers
No similar papers found.
A
Ali Sahami
PayPal AI
Sudhanshu Garg
Sudhanshu Garg
LinkedIn Inc.
Andrew Wang
Andrew Wang
University of Toronto, Vector Institute
AI Safety
C
Chaitanya Kulkarni
PayPal AI
F
Farhad Farahani
PayPal AI
S
Sean Yun-Shiuan Chuang
PayPal AI
J
Jian Wan
PayPal AI
S
Srinivasan Manoharan
PayPal AI
U
Uma Kona
PayPal AI
Nitin Sharma
Nitin Sharma
Associate Professor, Biomedical Engineering, North Carolina State University
Functional Electrical StimulationNeuroprosthesesRehabilitation RoboticsNonlinear Control
L
Linsey Pang
PayPal AI
P
Prakhar Mehrotra
PayPal AI
J
Jessica Clark
NVIDIA
M
Mark Moyou
NVIDIA