TACLR: A Scalable and Efficient Retrieval-based Method for Industrial Product Attribute Value Identification

📅 2025-01-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing Product Attribute Value Identification (PAVI) methods face significant bottlenecks in implicit attribute mining, out-of-distribution (OOD) value detection, and output standardization, rendering them inadequate for large-scale, multi-category, multi-attribute, and multi-value industrial e-commerce applications. This paper proposes the first retrieval-based modeling paradigm for PAVI, featuring a category-aware contrastive learning dual-tower architecture. We introduce category-guided hard negative sampling and a dynamic similarity thresholding inference mechanism to jointly address implicit value recall, robust OOD value identification, and normalized output generation. The framework scales to thousands of categories, tens of thousands of attributes, and millions of candidate values. Deployed industrially, it processes one million products daily. Extensive experiments demonstrate substantial improvements over state-of-the-art methods in both accuracy and throughput.

Technology Category

Application Category

📝 Abstract
Product Attribute Value Identification (PAVI) involves identifying attribute values from product profiles, a key task for improving product search, recommendations, and business analytics on e-commerce platforms. However, existing PAVI methods face critical challenges, such as inferring implicit values, handling out-of-distribution (OOD) values, and producing normalized outputs. To address these limitations, we introduce Taxonomy-Aware Contrastive Learning Retrieval (TACLR), the first retrieval-based method for PAVI. TACLR formulates PAVI as an information retrieval task by encoding product profiles and candidate values into embeddings and retrieving values based on their similarity to the item embedding. It leverages contrastive training with taxonomy-aware hard negative sampling and employs adaptive inference with dynamic thresholds. TACLR offers three key advantages: (1) it effectively handles implicit and OOD values while producing normalized outputs; (2) it scales to thousands of categories, tens of thousands of attributes, and millions of values; and (3) it supports efficient inference for high-load industrial scenarios. Extensive experiments on proprietary and public datasets validate the effectiveness and efficiency of TACLR. Moreover, it has been successfully deployed in a real-world e-commerce platform, processing millions of product listings daily while supporting dynamic, large-scale attribute taxonomies.
Problem

Research questions and friction points this paper is trying to address.

Product Attribute Value Identification
Implicit Attributes
Normalization Output
Innovation

Methods, ideas, or system contributions that make the work stand out.

TACLR
Retrieval-based PAVI
Classification Contrastive Learning
🔎 Similar Papers
No similar papers found.
Yindu Su
Yindu Su
Xiaohongshu Inc.
H
Huike Zou
Alibaba Group
Lin Sun
Lin Sun
Qihoo 360
large language model
T
Ting Zhang
Singapore Management University
H
Haiyang Yang
Alibaba Group
L
Liyu Chen
Alibaba Group
D
David Lo
Singapore Management University
Q
Qingheng Zhang
Alibaba Group
Shuguang Han
Shuguang Han
Google AI
information retrieval
J
Jufeng Chen
Alibaba Group