π€ AI Summary
This paper studies online convex optimization (OCO) under adversarial constraints, aiming to jointly minimize dynamic regret and cumulative constraint violation (CCV). To overcome the fundamental $ ilde{O}(sqrt{T})$ lower bound on CCV in prior work, we propose the first tunable trade-off framework, introducing a free parameter $eta in [0,1/2]$. Under general convexity, it achieves $ ilde{O}(dT^{1-eta})$ CCV and $ ilde{O}(sqrt{dT} + T^eta)$ dynamic regret; under smoothness, it improves to $O(T^{max{1/2,eta}})$ regret and $ ilde{O}(T^{1-eta})$ CCV. Key technical innovations include adaptive small-loss analysis, constraint-expert modeling, convex-set covering reduction, and tailored gradient descent design. Our approach is the first to break the conventional $ ilde{O}(sqrt{T})$ CCV barrier, establishing a novel continuous regretβCCV trade-off paradigm.
π Abstract
We revisit the Online Convex Optimization problem with adversarial constraints (COCO) where, in each round, a learner is presented with a convex cost function and a convex constraint function, both of which may be chosen adversarially. The learner selects actions from a convex decision set in an online fashion, with the goal of minimizing both regret and the cumulative constraint violation (CCV) over a horizon of $T$ rounds. The best-known policy for this problem achieves $O(sqrt{T})$ regret and $ ilde{O}(sqrt{T})$ CCV. In this paper, we present a surprising improvement that achieves a significantly smaller CCV by trading it off with regret. Specifically, for any bounded convex cost and constraint functions, we propose an online policy that achieves $ ilde{O}(sqrt{dT}+ T^eta)$ regret and $ ilde{O}(dT^{1-eta})$ CCV, where $d$ is the dimension of the decision set and $eta in [0,1]$ is a tunable parameter. We achieve this result by first considering the special case of $ extsf{Constrained Expert}$ problem where the decision set is a probability simplex and the cost and constraint functions are linear. Leveraging a new adaptive small-loss regret bound, we propose an efficient policy for the $ extsf{Constrained Expert}$ problem, that attains $O(sqrt{Tln N}+T^{eta})$ regret and $ ilde{O}(T^{1-eta} ln N)$ CCV, where $N$ is the number of experts. The original problem is then reduced to the $ extsf{Constrained Expert}$ problem via a covering argument. Finally, with an additional smoothness assumption, we propose an efficient gradient-based policy attaining $O(T^{max(frac{1}{2},eta)})$ regret and $ ilde{O}(T^{1-eta})$ CCV.