🤖 AI Summary
In automated advertising bidding, diffusion models suffer from high generation uncertainty and insufficient dynamic state validity, leading to ineffective bids. To address this, we propose a causal diffusion completion–alignment framework. First, we introduce a historical sequence completion mechanism to enhance temporal consistency and dynamic validity across adjacent bidding states. Second, we construct a trajectory-level reward model that explicitly aligns the generative process with advertisers’ long-term optimization objectives—such as conversion value and target cost per action (tCPA). Our method integrates stochastic variable augmentation during training and sequence-level cumulative reward modeling to improve bid flexibility and interpretability. Empirical evaluation on the Kuaishou advertising platform demonstrates that our approach achieves a 29.9% increase in conversion value and a 2.0% reduction in tCPA under sparse-reward settings, significantly outperforming existing baselines.
📝 Abstract
Auto-bidding is central to computational advertising, achieving notable commercial success by optimizing advertisers' bids within economic constraints. Recently, large generative models show potential to revolutionize auto-bidding by generating bids that could flexibly adapt to complex, competitive environments. Among them, diffusers stand out for their ability to address sparse-reward challenges by focusing on trajectory-level accumulated rewards, as well as their explainable capability, i.e., planning a future trajectory of states and executing bids accordingly. However, diffusers struggle with generation uncertainty, particularly regarding dynamic legitimacy between adjacent states, which can lead to poor bids and further cause significant loss of ad impression opportunities when competing with other advertisers in a highly competitive auction environment. To address it, we propose a Causal auto-Bidding method based on a Diffusion completer-aligner framework, termed CBD. Firstly, we augment the diffusion training process with an extra random variable t, where the model observes t-length historical sequences with the goal of completing the remaining sequence, thereby enhancing the generated sequences' dynamic legitimacy. Then, we employ a trajectory-level return model to refine the generated trajectories, aligning more closely with advertisers' objectives. Experimental results across diverse settings demonstrate that our approach not only achieves superior performance on large-scale auto-bidding benchmarks, such as a 29.9% improvement in conversion value in the challenging sparse-reward auction setting, but also delivers significant improvements on the Kuaishou online advertising platform, including a 2.0% increase in target cost.