Optimal Return-to-Go Guided Decision Transformer for Auto-Bidding in Advertisement

📅 2025-06-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Decision Transformers (DTs) for automated advertising bidding suffer from reliance on manually specified return-to-go (RTG) targets and degradation under mixed-quality trajectory data. Method: We propose the R* Decision Transformer framework, featuring a novel three-stage RTG optimization mechanism: (i) memorized RTG ($R_{ ext{DT}}$), (ii) predicted optimal RTG ($hat{R}_{ ext{DT}}$), and (iii) simulation-driven high-reward trajectory selection and augmentation ($R^*_{ ext{DT}}$), enabling progressive policy improvement. The method integrates sequential bidding modeling, dynamic RTG prediction, and simulation-augmented trajectory generation. Contribution/Results: Evaluated on public bidding datasets, R* DT significantly improves long-term revenue and ROI stability while exhibiting strong robustness to low-quality trajectories. It achieves an average ROI gain of 12.7% over baseline DTs, establishing a scalable, adaptive paradigm for generative automated bidding.

Technology Category

Application Category

📝 Abstract
In the realm of online advertising, advertisers partake in ad auctions to obtain advertising slots, frequently taking advantage of auto-bidding tools provided by demand-side platforms. To improve the automation of these bidding systems, we adopt generative models, namely the Decision Transformer (DT), to tackle the difficulties inherent in automated bidding. Applying the Decision Transformer to the auto-bidding task enables a unified approach to sequential modeling, which efficiently overcomes short-sightedness by capturing long-term dependencies between past bidding actions and user behavior. Nevertheless, conventional DT has certain drawbacks: (1) DT necessitates a preset return-to-go (RTG) value before generating actions, which is not inherently produced; (2) The policy learned by DT is restricted by its training data, which is consists of mixed-quality trajectories. To address these challenges, we introduce the R* Decision Transformer (R* DT), developed in a three-step process: (1) R DT: Similar to traditional DT, R DT stores actions based on state and RTG value, as well as memorizing the RTG for a given state using the training set; (2) R^ DT: We forecast the highest value (within the training set) of RTG for a given state, deriving a suboptimal policy based on the current state and the forecasted supreme RTG value; (3) R* DT: Based on R^ DT, we generate trajectories and select those with high rewards (using a simulator) to augment our training dataset. This data enhancement has been shown to improve the RTG of trajectories in the training data and gradually leads the suboptimal policy towards optimality. Comprehensive tests on a publicly available bidding dataset validate the R* DT's efficacy and highlight its superiority when dealing with mixed-quality trajectories.
Problem

Research questions and friction points this paper is trying to address.

Auto-bidding lacks long-term dependency modeling in ad auctions
Decision Transformer requires preset RTG values not inherently generated
Mixed-quality training data limits policy performance in auto-bidding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Decision Transformer for auto-bidding automation
Introduces R* DT to optimize return-to-go values
Enhances training data with high-reward trajectories
🔎 Similar Papers
No similar papers found.
H
Hao Jiang
Kuaishou Technology, Beijing, China
Yongxiang Tang
Yongxiang Tang
Unknown affiliation
Y
Yanxiang Zeng
Kuaishou Technology, Beijing, China
P
Pengjia Yuan
Kuaishou Technology, Beijing, China
Yanhua Cheng
Yanhua Cheng
快手
Computer VisionMachine LearningRecommendation
T
Teng Sha
Kuaishou Technology, Beijing, China
Xialong Liu
Xialong Liu
Kuaishou Technology
Machine LearningRecommendation
P
Peng Jiang
Kuaishou Technology, Beijing, China