🤖 AI Summary
This paper studies online learning under adversarial environments with long-term budget constraints: in each round, after the learner selects an action, an adversary reveals α-approximately convex cost and resource consumption functions; the goal is to minimize cumulative cost over $T$ rounds while satisfying a total resource budget $B_T$. We propose the first first-order online algorithm for this setting, unifying full-information and bandit feedback. Our algorithm introduces a dynamic resource regulation mechanism and achieves $O(sqrt{T})$ $alpha$-regret and $O(B_T log T) + ilde{O}(sqrt{T})$ total resource consumption under both feedback models—both bounds being tight in their dependence on $T$ and $B_T$. This is the first result achieving optimal regret and resource consumption guarantees for adversarial $alpha$-approximately convex optimization under long-term constraints, significantly improving upon prior theoretical guarantees for classic problems such as adversarial multi-armed bandits with knapsack constraints.
📝 Abstract
We study an online learning problem with long-term budget constraints in the adversarial setting. In this problem, at each round $t$, the learner selects an action from a convex decision set, after which the adversary reveals a cost function $f_t$ and a resource consumption function $g_t$. The cost and consumption functions are assumed to be $α$-approximately convex - a broad class that generalizes convexity and encompasses many common non-convex optimization problems, including DR-submodular maximization, Online Vertex Cover, and Regularized Phase Retrieval. The goal is to design an online algorithm that minimizes cumulative cost over a horizon of length $T$ while approximately satisfying a long-term budget constraint of $B_T$. We propose an efficient first-order online algorithm that guarantees $O(sqrt{T})$ $α$-regret against the optimal fixed feasible benchmark while consuming at most $O(B_T log T)+ ilde{O}(sqrt{T})$ resources in both full-information and bandit feedback settings. In the bandit feedback setting, our approach yields an efficient solution for the $ exttt{Adversarial Bandits with Knapsacks}$ problem with improved guarantees. We also prove matching lower bounds, demonstrating the tightness of our results. Finally, we characterize the class of $α$-approximately convex functions and show that our results apply to a broad family of problems.