GAS: Generative Auto-bidding with Post-training Search

📅 2024-12-22

🏛️ The Web Conference

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

To address strategy misalignment in generative auto-bidding models caused by low-quality training data and preference bias, this paper proposes a lightweight post-training search framework that generalizes to diverse advertiser preferences without retraining. Our method employs Monte Carlo Tree Search (MCTS)-guided heuristic search integrated with a Transformer-diffusion hybrid generation architecture to realize a novel “weak-to-strong” search alignment mechanism. We further design a lightweight, preference-specific critic trained via policy-instructed supervision, and introduce a multi-vote voting scheme to enhance search robustness. Evaluated on Kuaishou’s live advertising platform via A/B testing, our approach achieves a 4.60% reduction in target cost—outperforming existing generative bidding methods—and strikes an effective balance among performance, cross-preference generalization, and deployment efficiency.

Technology Category

Application Category

📝 Abstract

Auto-bidding is essential in facilitating online advertising by automatically placing bids on behalf of advertisers. Generative auto-bidding, which generates bids based on an adjustable condition using models like transformers and diffusers, has recently emerged as a new trend due to its potential to learn optimal strategies directly from data and adjust flexibly to preferences. However, generative models suffer from low-quality data leading to a mismatch between the condition, like return to go, and true action value, especially in long sequential decision-making. Besides, the majority preference in the dataset may hinder models' generalization ability on minority advertisers' preferences. While it is possible to collect high-quality data and retrain multiple models for different preferences, the high cost makes it unaffordable, hindering the advancement of auto-bidding into the era of large foundation models. To address this, we propose a flexible and practical Generative Auto-bidding scheme using post-training Search, termed GAS, to refine a base policy model's output and adapt to various preferences. We use weak-to-strong search alignment by training small critics for different preferences and an MCTS-inspired search to refine the model's output. Specifically, a novel voting mechanism with transformer-based critics trained with policy indications could enhance search alignment performance. Additionally, utilizing the search, we provide a fine-tuning method for high-frequency preference scenarios considering computational efficiency. Extensive experiments conducted on the real-world dataset and online A/B test on the Kuaishou advertising platform demonstrate the effectiveness of GAS, achieving significant improvements, e.g., 4.60% increment of target cost.

Problem

Research questions and friction points this paper is trying to address.

Generative auto-bidding models suffer from low-quality data causing condition-action mismatch.

Dataset majority preference limits model generalization for minority advertisers.

High cost of retraining models for diverse preferences hinders large-scale adoption.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative Auto-bidding with post-training search

Weak-to-strong search alignment using critics

Transformer-based voting mechanism for alignment

🔎 Similar Papers

No similar papers found.