Learning to Bid in Non-Stationary Repeated First-Price Auctions

📅 2025-01-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the problem of learning intelligent bidding strategies in nonstationary first-price auctions for digital advertising, where environmental dynamics and time-varying opponent behaviors render static optimal policies ineffective. To tackle this, we propose two novel nonstationarity measures quantifying the regularity of bid sequences and establish, for the first time, a minimax lower bound on dynamic regret—surpassing the limitations of conventional static benchmarks. Under two classes of sublinear nonstationarity constraints, we design a bidding algorithm integrating online learning, game-theoretic modeling, and nonstationary sequence analysis, achieving a regret upper bound matching that of the dynamic oracle—thus attaining theoretical optimality. Our core contributions are threefold: (i) introducing the first quantitative metrics for auction nonstationarity; (ii) establishing fundamental limits and optimality guarantees for dynamic regret minimization; and (iii) providing a provably optimal, adaptive bidding framework.

Technology Category

Application Category

📝 Abstract
First-price auctions have recently gained significant traction in digital advertising markets, exemplified by Google's transition from second-price to first-price auctions. Unlike in second-price auctions, where bidding one's private valuation is a dominant strategy, determining an optimal bidding strategy in first-price auctions is more complex. From a learning perspective, the learner (a specific bidder) can interact with the environment (other bidders) sequentially to infer their behaviors. Existing research often assumes specific environmental conditions and benchmarks performance against the best fixed policy (static benchmark). While this approach ensures strong learning guarantees, the static benchmark can deviate significantly from the optimal strategy in environments with even mild non-stationarity. To address such scenarios, a dynamic benchmark, which represents the sum of the best possible rewards at each time step, offers a more suitable objective. However, achieving no-regret learning with respect to the dynamic benchmark requires additional constraints. By inspecting reward functions in online first-price auctions, we introduce two metrics to quantify the regularity of the bidding sequence, which serve as measures of non-stationarity. We provide a minimax-optimal characterization of the dynamic regret when either of these metrics is sub-linear in the time horizon.
Problem

Research questions and friction points this paper is trying to address.

First-price Auctions
Adaptive Bidding Strategy
Revenue Maximization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic Regret Minimization
First-Price Auctions
Bid Order Regularity
🔎 Similar Papers
No similar papers found.
Z
Zihao Hu
Department of Mathematics, The Hong Kong University of Science and Technology; Department of IEDA, The Hong Kong University of Science and Technology
X
Xiaoyu Fan
Stern School of Business, New York University
Y
Yuan Yao
Department of Mathematics, The Hong Kong University of Science and Technology
Jiheng Zhang
Jiheng Zhang
The Hong Kong University of Science and Technology
Applied ProbabilityStochastic Modeling and OptimizationNumerical Methods and Algorithm
Zhengyuan Zhou
Zhengyuan Zhou
Dept of Technology, Operations and Statistics at NYU Stern
reinforcement learningoptimizationgame theoryoperations research