🤖 AI Summary
This study addresses the challenge of generating high-quality user purchase labels for training ad ranking models in e-commerce, where ad creatives often lack a direct mapping to specific product types. To resolve this, the authors propose an end-to-end attribution framework based on large language models (LLMs). The approach semantically parses ad intent, retrieves relevant candidates from the platform’s product taxonomy, and employs a structured LLM-based classifier to identify associated product types, thereby constructing a coverage set of ad–product type pairs to generate scalable positive labels. By reframing the inherently ambiguous attribution problem as a computable semantic alignment task, the method achieves label precision of 78%–90% and recall exceeding 99% on both internal and synthetic datasets, substantially outperforming conventional approaches. The framework has been successfully deployed in production to optimize ad ranking performance.
📝 Abstract
E-commerce campaign ranking models require large-scale training labels indicating which users purchased due to campaign influence. However, generating these labels is challenging because campaigns use creative, thematic language that does not directly map to product purchases. Without clear product-level attribution, supervised learning for campaign optimization remains limited. We present \textbf{Campaign-2-PT-RAG}, a scalable label generation framework that constructs user--campaign purchase labels by inferring which product types (PTs) each campaign promotes. The framework first interprets campaign content using large language models (LLMs) to capture implicit intent, then retrieves candidate PTs through semantic search over the platform taxonomy. A structured LLM-based classifier evaluates each PT's relevance, producing a campaign-specific product coverage set. User purchases matching these PTs generate positive training labels for downstream ranking models. This approach reframes the ambiguous attribution problem into a tractable semantic alignment task, enabling scalable and consistent supervision for downstream tasks such as campaign ranking optimization in production e-commerce environments. Experiments on internal and synthetic datasets, validated against expert-annotated campaign--PT mappings, show that our LLM-assisted approach generates high-quality labels with 78--90% precision while maintaining over 99% recall.