🤖 AI Summary
In real-time bidding (RTB), existing two-stage bid shading methods suffer from restrictive unimodal assumptions, severe error propagation across stages, and sample selection bias. To address these issues, we propose Generative Bid Shading (GBS), a novel framework that models complex multimodal bid distributions via autoregressive residual generation—eliminating the unimodal constraint. GBS introduces Channel-aware Hierarchical Dynamic Network (CHNet) and Group-wise Relative Policy Optimization (GRPO) to enhance robustness under non-convex remaining-budget curves. Furthermore, an exploration-utility reward alignment mechanism and a residual optimization module jointly optimize short-term bid accuracy and long-term budget equilibrium. Offline evaluations and online A/B tests demonstrate that GBS significantly outperforms state-of-the-art baselines. Deployed in Meituan’s demand-side platform (DSP), GBS processes over 1 billion RTB requests daily, effectively mitigating advertiser overspending risk.
📝 Abstract
Bid shading plays a crucial role in Real-Time Bidding~(RTB) by adaptively adjusting the bid to avoid advertisers overspending. Existing mainstream two-stage methods, which first model bid landscapes and then optimize surplus using operations research techniques, are constrained by unimodal assumptions that fail to adapt for non-convex surplus curves and are vulnerable to cascading errors in sequential workflows. Additionally, existing discretization models of continuous values ignore the dependence between discrete intervals, reducing the model's error correction ability, while sample selection bias in bidding scenarios presents further challenges for prediction. To address these issues, this paper introduces Generative Bid Shading~(GBS), which comprises two primary components: (1) an end-to-end generative model that utilizes an autoregressive approach to generate shading ratios by stepwise residuals, capturing complex value dependencies without relying on predefined priors; and (2) a reward preference alignment system, which incorporates a channel-aware hierarchical dynamic network~(CHNet) as the reward model to extract fine-grained features, along with modules for surplus optimization and exploration utility reward alignment, ultimately optimizing both short-term and long-term surplus using group relative policy optimization~(GRPO). Extensive experiments on both offline and online A/B tests validate GBS's effectiveness. Moreover, GBS has been deployed on the Meituan DSP platform, serving billions of bid requests daily.