Deep Reinforcement Learning for Ranking Utility Tuning in the Ad Recommender System at Pinterest

๐Ÿ“… 2025-09-05
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Traditional ad ranking utility functions suffer from ambiguous optimization objectives, highly coupled parameters, and insufficient personalization and seasonality adaptation due to manual tuning. This paper proposes a deep reinforcement learningโ€“based framework for personalized, dynamic utility parameter optimization, formulating parameter tuning as a policy learning problem. It learns the optimal policy end-to-end directly from online serving logs, circumventing high-variance value estimation while enabling real-time adaptation and user-level personalization. Our approach innovatively integrates multi-objective reward design with policy gradient optimization. Large-scale A/B testing demonstrates significant improvements over manually tuned baselines: +9.7% in click-through rate (CTR) and +7.7% in long-click rate (LCR), substantially enhancing the tripartite value balance among platforms, advertisers, and users.

Technology Category

Application Category

๐Ÿ“ Abstract
The ranking utility function in an ad recommender system, which linearly combines predictions of various business goals, plays a central role in balancing values across the platform, advertisers, and users. Traditional manual tuning, while offering simplicity and interpretability, often yields suboptimal results due to its unprincipled tuning objectives, the vast amount of parameter combinations, and its lack of personalization and adaptability to seasonality. In this work, we propose a general Deep Reinforcement Learning framework for Personalized Utility Tuning (DRL-PUT) to address the challenges of multi-objective optimization within ad recommender systems. Our key contributions include: 1) Formulating the problem as a reinforcement learning task: given the state of an ad request, we predict the optimal hyperparameters to maximize a pre-defined reward. 2) Developing an approach to directly learn an optimal policy model using online serving logs, avoiding the need to estimate a value function, which is inherently challenging due to the high variance and unbalanced distribution of immediate rewards. We evaluated DRL-PUT through an online A/B experiment in Pinterest's ad recommender system. Compared to the baseline manual utility tuning approach, DRL-PUT improved the click-through rate by 9.7% and the long click-through rate by 7.7% on the treated segment. We conducted a detailed ablation study on the impact of different reward definitions and analyzed the personalization aspect of the learned policy model.
Problem

Research questions and friction points this paper is trying to address.

Optimizing multi-objective ad ranking utility function
Replacing manual tuning with adaptive reinforcement learning
Personalizing hyperparameters to improve engagement metrics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep Reinforcement Learning for personalized utility tuning
Direct policy learning from online serving logs
Optimal hyperparameter prediction to maximize reward
๐Ÿ”Ž Similar Papers
No similar papers found.
X
Xiao Yang
Pinterest Inc., San Francisco, California, USA
M
Mehdi Ben Ayed
Pinterest Inc., New York City, New York, USA
L
Longyu Zhao
Pinterest Inc., San Francisco, California, USA
F
Fan Zhou
Pinterest Inc., San Francisco, California, USA
Yuchen Shen
Yuchen Shen
CMU
NLPMLAI4Science
A
Abe Engle
Pinterest Inc., San Francisco, California, USA
J
Jinfeng Zhuang
Pinterest Inc., San Francisco, California, USA
L
Ling Leng
Pinterest Inc., Seattle, Washington, USA
Jiajing Xu
Jiajing Xu
Pinterest
Recommendation systemInformation retrievalDeep learning
Charles Rosenberg
Charles Rosenberg
Pinterest
computer visionmachine learning
P
Prathibha Deshikachar
Pinterest Inc., San Francisco, California, USA