The Evolution of Embedding Table Optimization and Multi-Epoch Training in Pinterest Ads Conversion

📅 2025-05-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In Pinterest’s ad conversion prediction, multi-epoch embedding training suffers from slow convergence and severe overfitting in early epochs—rooted in gradient sparsity, high-cardinality ID features, long-tailed ID distributions, and extremely sparse labels. To address this, we propose Frequency-Adaptive Learning Rate (FALR), a novel mechanism replacing conventional embedding re-initialization. FALR dynamically adjusts per-ID embedding learning rates based on occurrence frequency. We further reveal, for the first time, a strong correlation between multi-task overfitting severity and label sparsity. Integrated with sparse optimizers, frequency-weighted learning rate scheduling, and end-to-end embedding learning, FALR achieves a 0.8% AUC gain on Pinterest’s industrial dataset, delays overfitting by over 1.5 epochs, and significantly improves training stability and generalization across multiple epochs.

Technology Category

Application Category

📝 Abstract
Deep learning for conversion prediction has found widespread applications in online advertising. These models have become more complex as they are trained to jointly predict multiple objectives such as click, add-to-cart, checkout and other conversion types. Additionally, the capacity and performance of these models can often be increased with the use of embedding tables that encode high cardinality categorical features such as advertiser, user, campaign, and product identifiers (IDs). These embedding tables can be pre-trained, but also learned end-to-end jointly with the model to directly optimize the model objectives. Training these large tables is challenging due to: gradient sparsity, the high cardinality of the categorical features, the non-uniform distribution of IDs and the very high label sparsity. These issues make training prone to both slow convergence and overfitting after the first epoch. Previous works addressed the multi-epoch overfitting issue by using: stronger feature hashing to reduce cardinality, filtering of low frequency IDs, regularization of the embedding tables, re-initialization of the embedding tables after each epoch, etc. Some of these techniques reduce overfitting at the expense of reduced model performance if used too aggressively. In this paper, we share key learnings from the development of embedding table optimization and multi-epoch training in Pinterest Ads Conversion models. We showcase how our Sparse Optimizer speeds up convergence, and how multi-epoch overfitting varies in severity between different objectives in a multi-task model depending on label sparsity. We propose a new approach to deal with multi-epoch overfitting: the use of a frequency-adaptive learning rate on the embedding tables and compare it to embedding re-initialization. We evaluate both methods offline using an industrial large-scale production dataset.
Problem

Research questions and friction points this paper is trying to address.

Optimizing embedding tables for high cardinality categorical features
Addressing multi-epoch overfitting in deep learning conversion models
Improving convergence and performance with sparse optimizer techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse Optimizer accelerates convergence speed
Frequency-adaptive learning rate reduces overfitting
Multi-task model adapts to label sparsity
🔎 Similar Papers
No similar papers found.
A
Andrew Qiu
Pinterest Inc.
S
Shubham Barhate
Pinterest Inc.
H
H. Lui
University of California Irvine
R
Runze Su
Pinterest Inc.
R
Rafael Rios Muller
Pinterest Inc.
K
Kungang Li
Pinterest Inc.
L
Ling Leng
Pinterest Inc.
H
Han Sun
Pinterest Inc.
Shayan Ehsani
Shayan Ehsani
Stanford University
Zhifang Liu
Zhifang Liu
School of Mathematical Sciences, Tianjin Normal University
image processing