🤖 AI Summary
This work addresses the challenge that existing automated bidding methods struggle to simultaneously model long-term dependencies, ensure exploration efficiency, and guarantee financial safety, often lacking a unified mechanism for exploration and safe fallback. To overcome these limitations, the authors propose GUIDE, a novel framework featuring an integrated “explore–guarantee–select” architecture. GUIDE leverages a Decision Transformer to jointly model historical bids and environmental states, incorporates Q-value-guided directed exploration, employs an Inverse Dynamics Module (IDM) to provide safe fallback policies, and utilizes an adaptive action selection mechanism for dynamic bid optimization. Deployed at scale on Taobao, GUIDE significantly outperforms current approaches, achieving a 4.10% increase in ad GMV, a 1.40% gain in clicks, and a 3.52% improvement in ROI, demonstrating its effectiveness in balancing efficient exploration with financial safety.
📝 Abstract
Automated bidding is central to modern digital advertising. Early rule-based methods lacked adaptability, while subsequent Reinforcement Learning approaches modeled bidding as a Markov Decision Process but struggled with long-term dependencies. Recent generative models show promise, yet they lack explicit mechanisms to balance exploration and safety, relying solely on action perturbations or trajectory guidance without a safety fallback. This results in inefficient exploration and elevated financial risk for advertising platforms.
To address this gap, we propose GUIDE (Generative Auto-Bidding with Unified Modeling and Exploration), a framework that synergistically integrates directed exploration with a safe fallback mechanism. GUIDE employs a Decision Transformer (DT) to jointly model historical bidding actions and environmental state transitions. A Q-value module guides the DT's exploration via regularization constraints, while an Inverse Dynamics Module (IDM) leverages DT-predicted future states to infer robust, behaviorally consistent actions as a safe policy fallback. The Q-value module then adaptively selects the final action between these two options, balancing exploration and safety. Together, these components form an integrated "explore-safeguard-select" pipeline that unifies efficiency and safety.
We conduct extensive experiments on public datasets, in simulated auction environments, and through large-scale online deployment on Taobao, a leading Chinese advertising platform. Results show GUIDE consistently outperforms state-of-the-art baselines across all scenarios. In real-world deployment, GUIDE achieves notable gains: +4.10% ad GMV, +1.40% ad clicks, +1.66% ad cost, and +3.52% ad ROI, demonstrating its effectiveness and strong industrial applicability.