Q-Regularized Generative Auto-Bidding: From Suboptimal Trajectories to Optimal Policies

📅 2026-01-06
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes QGA, a novel approach that integrates Q-value regularization with generative bidding for the first time, addressing the limitations of existing automatic bidding methods that often rely on complex architectures, extensive hyperparameter tuning, and are hindered by suboptimal historical trajectories. Built upon the Decision Transformer framework, QGA incorporates double Q-learning, multi-objective return-to-go conditioning, and local action perturbation, along with a Q-guided dual exploration mechanism that jointly optimizes policy imitation and action-value maximization. This design effectively mitigates the adverse influence of suboptimal data and significantly enhances policy generalization. Extensive experiments demonstrate that QGA achieves superior performance on public benchmarks and simulated environments, with large-scale A/B tests showing a 3.27% increase in advertising GMV and a 2.49% improvement in ROI.

Technology Category

Application Category

📝 Abstract
With the rapid development of e-commerce, auto-bidding has become a key asset in optimizing advertising performance under diverse advertiser environments. The current approaches focus on reinforcement learning (RL) and generative models. These efforts imitate offline historical behaviors by utilizing a complex structure with expensive hyperparameter tuning. The suboptimal trajectories further exacerbate the difficulty of policy learning. To address these challenges, we proposes QGA, a novel Q-value regularized Generative Auto-bidding method. In QGA, we propose to plug a Q-value regularization with double Q-learning strategy into the Decision Transformer backbone. This design enables joint optimization of policy imitation and action-value maximization, allowing the learned bidding policy to both leverage experience from the dataset and alleviate the adverse impact of the suboptimal trajectories. Furthermore, to safely explore the policy space beyond the data distribution, we propose a Q-value guided dual-exploration mechanism, in which the DT model is conditioned on multiple return-to-go targets and locally perturbed actions. This entire exploration process is dynamically guided by the aforementioned Q-value module, which provides principled evaluation for each candidate action. Experiments on public benchmarks and simulation environments demonstrate that QGA consistently achieves superior or highly competitive results compared to existing alternatives. Notably, in large-scale real-world A/B testing, QGA achieves a 3.27% increase in Ad GMV and a 2.49% improvement in Ad ROI.
Problem

Research questions and friction points this paper is trying to address.

auto-bidding
suboptimal trajectories
policy learning
reinforcement learning
generative models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Q-value regularization
Decision Transformer
auto-bidding
double Q-learning
dual-exploration mechanism
🔎 Similar Papers
No similar papers found.
Mingming Zhang
Mingming Zhang
Beihang University
big data
N
Na Li
Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education, School of Cyber Science and Engineering, Wuhan University; Taobao & Tmall Group of Alibaba, Wuhan, China
F
Feiqing Zhuang
Taobao & Tmall Group of Alibaba, Hangzhou, China
H
Hongyang Zheng
Taobao & Tmall Group of Alibaba, Hangzhou, China
J
Jiangbing Zhou
Taobao & Tmall Group of Alibaba, Hangzhou, China
W
Wuyin Wang
Taobao & Tmall Group of Alibaba, Hangzhou, China
S
Shengjie Sun
Taobao & Tmall Group of Alibaba, Hangzhou, China
X
Xiaowei Chen
Taobao & Tmall Group of Alibaba, Hangzhou, China
J
Junxiong Zhu
Taobao & Tmall Group of Alibaba, Hangzhou, China
Lixin Zou
Lixin Zou
Wuhan University
Information RetrievalRecommender SystemReinforcement LearningLarge Language Model
Chenliang Li
Chenliang Li
School of Cyber Science and Engineering, Wuhan University
Information RetrievalData MiningNatural Language ProcessingSocial Media