Generative Auto-Bidding with Value-Guided Explorations

📅 2025-04-20

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

Existing automated bidding strategies for dynamic advertising markets suffer from three key limitations: rule-based approaches lack adaptability; reinforcement learning (RL) methods struggle to model historical dependencies and sequential observations; and offline training often leads to out-of-distribution (OOD) generalization failure and policy collapse. This paper proposes GAVE, an offline generative automated bidding framework. GAVE introduces two novel components: a value-guided exploration mechanism and return-to-go (RTG)-conditioned generative modeling. It integrates score-based generative models, a learnable value function, and offline RL principles. This design jointly ensures policy stability and enables discovery of novel, high-value bidding actions—effectively mitigating OOD generalization issues and policy collapse. Extensive evaluation on two offline benchmark datasets and real-world online A/B tests demonstrates consistent and significant improvements over state-of-the-art methods. The code is publicly released to support reproducibility and industrial deployment.

Technology Category

Application Category

📝 Abstract

Auto-bidding, with its strong capability to optimize bidding decisions within dynamic and competitive online environments, has become a pivotal strategy for advertising platforms. Existing approaches typically employ rule-based strategies or Reinforcement Learning (RL) techniques. However, rule-based strategies lack the flexibility to adapt to time-varying market conditions, and RL-based methods struggle to capture essential historical dependencies and observations within Markov Decision Process (MDP) frameworks. Furthermore, these approaches often face challenges in ensuring strategy adaptability across diverse advertising objectives. Additionally, as offline training methods are increasingly adopted to facilitate the deployment and maintenance of stable online strategies, the issues of documented behavioral patterns and behavioral collapse resulting from training on fixed offline datasets become increasingly significant. To address these limitations, this paper introduces a novel offline Generative Auto-bidding framework with Value-Guided Explorations (GAVE). GAVE accommodates various advertising objectives through a score-based Return-To-Go (RTG) module. Moreover, GAVE integrates an action exploration mechanism with an RTG-based evaluation method to explore novel actions while ensuring stability-preserving updates. A learnable value function is also designed to guide the direction of action exploration and mitigate Out-of-Distribution (OOD) problems. Experimental results on two offline datasets and real-world deployments demonstrate that GAVE outperforms state-of-the-art baselines in both offline evaluations and online A/B tests. The implementation code is publicly available to facilitate reproducibility and further research.

Problem

Research questions and friction points this paper is trying to address.

Improves auto-bidding adaptability in dynamic markets

Addresses historical dependency limitations in RL methods

Mitigates OOD issues in offline training strategies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative Auto-bidding with Value-Guided Explorations (GAVE)

Score-based Return-To-Go (RTG) module

Action exploration with RTG-based evaluation

🔎 Similar Papers

No similar papers found.