🤖 AI Summary
This paper studies the optimization of winning-time distribution under budget constraints in repeated auctions—arising in online retail, cloud services, and digital advertising—where both conversion revenue maximization and temporal uniformity of wins (to avoid congestion and ensure sustained visibility) must be balanced. We propose the first auction learning framework that explicitly models temporal regularity, introducing a concave decay model for win utility dependent on the time elapsed since the last win. Theoretically, we prove the problem admits an exact finite-state MDP characterization and establish that state-dependent policies are necessary to achieve sublinear regret. Algorithmically, we design a Bayesian online learning mechanism coupled with context-aware bidding strategies, attaining an $ ilde{O}(sqrt{T})$ regret bound. We further show that memoryless (stateless) policies inevitably incur linear regret, whereas our approach guarantees a $(1-1/e)$-approximation to the optimal reward.
📝 Abstract
In many repeated auction settings, participants care not only about how frequently they win but also how their winnings are distributed over time. This problem arises in various practical domains where avoiding congested demand is crucial, such as online retail sales and compute services, as well as in advertising campaigns that require sustained visibility over time. We introduce a simple model of this phenomenon, modeling it as a budgeted auction where the value of a win is a concave function of the time since the last win. This implies that for a given number of wins, even spacing over time is optimal. We also extend our model and results to the case when not all wins result in"conversions"(realization of actual gains), and the probability of conversion depends on a context. The goal is to maximize and evenly space conversions rather than just wins. We study the optimal policies for this setting in second-price auctions and offer learning algorithms for the bidders that achieve low regret against the optimal bidding policy in a Bayesian online setting. Our main result is a computationally efficient online learning algorithm that achieves $ ilde O(sqrt T)$ regret. We achieve this by showing that an infinite-horizon Markov decision process (MDP) with the budget constraint in expectation is essentially equivalent to our problem, even when limiting that MDP to a very small number of states. The algorithm achieves low regret by learning a bidding policy that chooses bids as a function of the context and the system's state, which will be the time elapsed since the last win (or conversion). We show that state-independent strategies incur linear regret even without uncertainty of conversions. We complement this by showing that there are state-independent strategies that, while still having linear regret, achieve a $(1-frac 1 e)$ approximation to the optimal reward.