EPM-RL: Reinforcement Learning for On-Premise Product Mapping in E-Commerce

📅 2026-04-26

📈 Citations: 0

✨ Influential: 0

career value

254K/year

🤖 AI Summary

This work addresses the problem of heterogeneous product naming in e-commerce titles caused by promotional phrases, platform tags, and bundle descriptions. To tackle this challenge, the authors propose EPM-RL, a novel framework that introduces reinforcement learning to product matching for the first time. EPM-RL leverages parameter-efficient fine-tuning (PEFT) of large language models (LLMs) for generative reasoning, initialized with human-annotated data, and incorporates a multi-agent composite reward mechanism to distill high-latency, costly multi-agent inference into a private, end-to-end deployable local model. Compared to baselines relying solely on PEFT or commercial APIs, EPM-RL achieves higher matching accuracy while significantly reducing latency and deployment costs, making it suitable for privacy-sensitive, large-scale enterprise applications.

Technology Category

Application Category

📝 Abstract

Product mapping, the task of deciding whether two e-commerce listings refer to the same product, is a core problem for price monitoring and channel visibility. In real marketplaces, however, sellers frequently inject promotional keywords, platform-specific tags, and bundle descriptions into titles, causing the same product to appear under many different names. Recent LLM-based and multi-agent frameworks improve robustness and interpretability on such hard cases, but they often rely on expensive external APIs, repeated retrieval, and complex inference-time orchestration, making large-scale deployment costly and difficult in privacy-sensitive enterprise settings. To address these issues, we present EPM-RL, a reinforcement-learning-based framework for building an accurate and efficient on-premise e-commerce product mapping model. Our central idea is to distill high-cost agentic reasoning into a trainable in-house model. Starting from a curated set of product pairs with LLM-generated rationales and human verification, we first perform parameter-efficient fine-tuning (PEFT) on a small student model using structured reasoning outputs. We then further optimize the model with Reinforcement Learning (RL) using an agent-based reward that jointly evaluates output-format compliance, label correctness, reasoning--preference scores from specially designed judge models. Preliminary results show that EPM-RL consistently improves over PEFT-only training and offers a stronger quality--cost trade-off than commercial API-based baselines, while enabling private deployment and lower operational cost. These findings suggest that reinforcement learning can turn product mapping from a high-latency agentic pipeline into a scalable, inspectable, and production-ready in-house system.

Problem

Research questions and friction points this paper is trying to address.

product mapping

e-commerce

on-premise deployment

privacy-sensitive

promotional noise

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement Learning

On-Premise Deployment

Product Mapping