NEMO-4-PAYPAL: Leveraging NVIDIA's Nemo Framework for empowering PayPal's Commerce Agent

📅 2025-12-25

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

To address the critical performance bottleneck in PayPal’s business agent—excessive retrieval-component latency (>50% of end-to-end latency)—this work designs an e-commerce–oriented production-grade multi-agent system. It pioneers the application of NVIDIA NeMo to retrieval optimization, leveraging the Nemotron-8B small language model and introducing a retrieval-specific LoRA fine-tuning strategy, integrated with AdamW optimization, learning-rate scanning, and cosine annealing scheduling. Experiments demonstrate >50% reduction in retrieval latency, substantial decreases in overall inference latency and computational cost, while maintaining or improving task accuracy and user experience. Key contributions include: (1) the first industrial deployment of NeMo in an e-commerce multi-agent system; (2) a retrieval-oriented, lightweight LLM fine-tuning paradigm; and (3) a scalable, low-latency, cost-efficient, and high-quality commercial agent architecture.

Technology Category

Application Category

📝 Abstract

We present the development and optimization of PayPal's Commerce Agent, powered by NEMO-4-PAYPAL, a multi-agent system designed to revolutionize agentic commerce on the PayPal platform. Through our strategic partnership with NVIDIA, we leveraged the NeMo Framework for LLM model fine-tuning to enhance agent performance. Specifically, we optimized the Search and Discovery agent by replacing our base model with a fine-tuned Nemotron small language model (SLM). We conducted comprehensive experiments using the llama3.1-nemotron-nano-8B-v1 architecture, training LoRA-based models through systematic hyperparameter sweeps across learning rates, optimizers (Adam, AdamW), cosine annealing schedules, and LoRA ranks. Our contributions include: (1) the first application of NVIDIA's NeMo Framework to commerce-specific agent optimization, (2) LLM powered fine-tuning strategy for retrieval-focused commerce tasks, (3) demonstration of significant improvements in latency and cost while maintaining agent quality, and (4) a scalable framework for multi-agent system optimization in production e-commerce environments. Our results demonstrate that the fine-tuned Nemotron SLM effectively resolves the key performance issue in the retrieval component, which represents over 50% of total agent response time, while maintaining or enhancing overall system performance.

Problem

Research questions and friction points this paper is trying to address.

Optimizing PayPal's commerce agent performance using NVIDIA's NeMo framework

Enhancing search and discovery agent through fine-tuned language models

Reducing latency and cost while maintaining agent quality in e-commerce

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuned Nemotron SLM for commerce agent optimization

Used NeMo Framework for LLM fine-tuning and hyperparameter sweeps

Scalable multi-agent framework for e-commerce performance improvement

🔎 Similar Papers

No similar papers found.