🤖 AI Summary
Cross-platform SKU matching in e-commerce faces challenges including missing identifiers, substantial naming heterogeneity, and frequent neglect of fine-grained attributes (e.g., brand, specifications, bundle configurations), rendering conventional rule- and keyword-based methods inaccurate. This paper proposes a multi-agent collaborative large language model framework: a reasoning agent generates verifiable factual statements; a knowledge retrieval agent enriches domain-specific context; and a deduplication agent reuses validated reasoning paths. An interactive human-in-the-loop feedback mechanism further refines uncertain cases. The approach significantly improves both matching accuracy and interpretability. Evaluated on a real-world consumer goods dataset, it outperforms strong baselines—particularly excelling in complex scenarios such as brand provenance tracing and composite product identification—demonstrating superior robustness and generalizability.
📝 Abstract
Identifying whether two product listings refer to the same Stock Keeping Unit (SKU) is a persistent challenge in ecommerce, especially when explicit identifiers are missing and product names vary widely across platforms. Rule based heuristics and keyword similarity often misclassify products by overlooking subtle distinctions in brand, specification, or bundle configuration. To overcome these limitations, we propose Question to Knowledge (Q2K), a multi agent framework that leverages Large Language Models (LLMs) for reliable SKU mapping. Q2K integrates: (1) a Reasoning Agent that generates targeted disambiguation questions, (2) a Knowledge Agent that resolves them via focused web searches, and (3) a Deduplication Agent that reuses validated reasoning traces to reduce redundancy and ensure consistency. A human in the loop mechanism further refines uncertain cases. Experiments on real world consumer goods datasets show that Q2K surpasses strong baselines, achieving higher accuracy and robustness in difficult scenarios such as bundle identification and brand origin disambiguation. By reusing retrieved reasoning instead of issuing repeated searches, Q2K balances accuracy with efficiency, offering a scalable and interpretable solution for product integration.