GSID: Generative Semantic Indexing for E-Commerce Product Understanding

📅 2025-09-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In second-hand e-commerce platforms, structural representation is hindered by insufficient coverage of long-tail items and misalignment between manually defined categories and buyer preferences. To address this, we propose Generative Semantic Indexing (GSID), a data-driven approach that abandons handcrafted rules. GSID leverages unstructured item metadata and employs domain-adaptive pretraining to learn semantic embeddings, then generates differentiable, optimization-friendly structured semantic codes in a task-oriented manner. This enables end-to-end, data-driven structural modeling. Deployed on a real-world e-commerce platform, GSID significantly enhances item understanding: it improves average AUC by 3.2–5.7 percentage points across downstream tasks—including search relevance ranking, personalized recommendation, and category prediction—demonstrating strong generalizability and practical efficacy.

Technology Category

Application Category

📝 Abstract
Structured representation of product information is a major bottleneck for the efficiency of e-commerce platforms, especially in second-hand ecommerce platforms. Currently, most product information are organized based on manually curated product categories and attributes, which often fail to adequately cover long-tail products and do not align well with buyer preference. To address these problems, we propose extbf{G}enerative extbf{S}emantic extbf{I}n extbf{D}exings (GSID), a data-driven approach to generate product structured representations. GSID consists of two key components: (1) Pre-training on unstructured product metadata to learn in-domain semantic embeddings, and (2) Generating more effective semantic codes tailored for downstream product-centric applications. Extensive experiments are conducted to validate the effectiveness of GSID, and it has been successfully deployed on the real-world e-commerce platform, achieving promising results on product understanding and other downstream tasks.
Problem

Research questions and friction points this paper is trying to address.

Generating structured product representations for e-commerce platforms
Addressing limitations of manual categorization for long-tail products
Creating semantic codes aligned with buyer preferences and applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative semantic indexing for product structured representations
Pre-training on unstructured metadata for semantic embeddings
Generating semantic codes for downstream e-commerce applications
🔎 Similar Papers
No similar papers found.
H
Haiyang Yang
Xianyu of Alibaba
Q
Qinye Xie
Xianyu of Alibaba
Q
Qingheng Zhang
Xianyu of Alibaba
L
Liyu Chen
Xianyu of Alibaba
H
Huike Zou
Xianyu of Alibaba
C
Chengbao Lian
Xianyu of Alibaba
Shuguang Han
Shuguang Han
Google AI
information retrieval
F
Fei Huang
Xianyu of Alibaba
J
Jufeng Chen
Xianyu of Alibaba
B
Bo Zheng
Xianyu of Alibaba,Taobao&Tmall Group of Alibaba