Online Bidding under RoS Constraints without Knowing the Value

📅 2025-03-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the joint learning-and-decision problem in real-time bidding for online advertising under dual constraints—budget and return-on-spend (RoS). Advertisers must simultaneously optimize bidding strategies and learn unknown per-impression values (e.g., conversion rates) from stochastic feedback, facing a fundamental exploration-exploitation trade-off. To this end, we propose the first UCB-based online bidding algorithm that unifies stochastic feedback modeling, constrained optimization, and exploration. Theoretically, our algorithm achieves optimal regret and constraint violation bounds of $widetilde{O}(sqrt{Tlog(|mathcal{B}|T)})$, eliminating reliance on prior knowledge of value distributions—a key limitation of existing methods. Empirical evaluation on synthetic benchmarks demonstrates substantial improvements over state-of-the-art baselines in both RoS and budget utilization.

Technology Category

Application Category

📝 Abstract
We consider the problem of bidding in online advertising, where an advertiser aims to maximize value while adhering to budget and Return-on-Spend (RoS) constraints. Unlike prior work that assumes knowledge of the value generated by winning each impression ({e.g.,} conversions), we address the more realistic setting where the advertiser must simultaneously learn the optimal bidding strategy and the value of each impression opportunity. This introduces a challenging exploration-exploitation dilemma: the advertiser must balance exploring different bids to estimate impression values with exploiting current knowledge to bid effectively. To address this, we propose a novel Upper Confidence Bound (UCB)-style algorithm that carefully manages this trade-off. Via a rigorous theoretical analysis, we prove that our algorithm achieves $widetilde{O}(sqrt{Tlog(|mathcal{B}|T)})$ regret and constraint violation, where $T$ is the number of bidding rounds and $mathcal{B}$ is the domain of possible bids. This establishes the first optimal regret and constraint violation bounds for bidding in the online setting with unknown impression values. Moreover, our algorithm is computationally efficient and simple to implement. We validate our theoretical findings through experiments on synthetic data, demonstrating that our algorithm exhibits strong empirical performance compared to existing approaches.
Problem

Research questions and friction points this paper is trying to address.

Maximize value in online advertising with budget and RoS constraints.
Learn optimal bidding strategy and impression values simultaneously.
Propose UCB-style algorithm for exploration-exploitation trade-off management.
Innovation

Methods, ideas, or system contributions that make the work stand out.

UCB-style algorithm for online bidding
Manages exploration-exploitation trade-off effectively
Achieves optimal regret and constraint violation bounds
🔎 Similar Papers
2024-05-11Social Science Research NetworkCitations: 3
Sushant Vijayan
Sushant Vijayan
TIFR
BanditsDifferential games
Z
Zhe Feng
Google Research, Mountain View, USA
Swati Padmanabhan
Swati Padmanabhan
Assistant Professor at the University of Minnesota Twin Cities
algorithmsconvex optimizationsemidefinite programsonline optimizationnonconvex optimization
K
Karthikeyan Shanmugam
Google DeepMind, Bengaluru, India
A
Arun Suggala
Google DeepMind, Bengaluru, India
D
Di Wang
Google Research, Mountain View, USA