🤖 AI Summary
This paper studies the online bidding problem in repeated first-price auctions under return-on-investment (ROI) and budget constraints, designing low-regret algorithms benchmarked against the ex-post optimal randomized policy. Addressing the strategic non-truthfulness inherent in first-price auctions—previously limiting regret analysis to weaker benchmarks or ignoring ROI constraints—we establish, for the first time, a near-optimal regret bound relative to the stochastic ex-post optimum. Methodologically, we integrate online convex optimization with multi-armed bandit theory, combining gradient estimation and confidence interval construction to devise adaptive bidding strategies under both full-feedback and bandit-feedback settings. Theoretical analysis yields regret bounds of $widetilde{O}(sqrt{T})$ under full feedback and $widetilde{O}(T^{3/4})$ under bandit feedback—both strictly improving upon prior results. Our framework is the first to achieve stochastic ex-post optimal benchmarking while jointly respecting ROI and budget constraints in first-price auctions.
📝 Abstract
Automated bidding to optimize online advertising with various constraints, e.g. ROI constraints and budget constraints, is widely adopted by advertisers. A key challenge lies in designing algorithms for non-truthful mechanisms with ROI constraints. While prior work has addressed truthful auctions or non-truthful auctions with weaker benchmarks, this paper provides a significant improvement: We develop online bidding algorithms for repeated first-price auctions with ROI constraints, benchmarking against the optimal randomized strategy in hindsight. In the full feedback setting, where the maximum competing bid is observed, our algorithm achieves a near-optimal $widetilde{O}(sqrt{T})$ regret bound, and in the bandit feedback setting (where the bidder only observes whether the bidder wins each auction), our algorithm attains $widetilde{O}(T^{3/4})$ regret bound.