🤖 AI Summary
This work addresses online convex optimization problems where both the loss functions and constraints depend on the learner’s finite history of decisions—arising, for instance, in constrained dynamic control and scheduling with reconfiguration budgets. The study investigates settings with memory and time-varying constraints, proposing an algorithm that simultaneously achieves sublinear regret and sublinear cumulative constraint violation, even without predictions or with only short-term (possibly unreliable) forecasts. The key innovation lies in establishing, for the first time, dual sublinear guarantees in constrained online optimization with memory. By modeling predictive information as delayed feedback, the authors design an adaptive, robust optimistic online learning algorithm that maintains theoretical performance in the absence of predictions and automatically improves as prediction accuracy increases, thereby bridging the theoretical gap between classical constrained online optimization and memory-dependent scenarios.
📝 Abstract
We study Constrained Online Convex Optimization with Memory (COCO-M), where both the loss and the constraints depend on a finite window of past decisions made by the learner. This setting extends the previously studied unconstrained online optimization with memory framework and captures practical problems such as the control of constrained dynamical systems and scheduling with reconfiguration budgets. For this problem, we propose the first algorithms that achieve sublinear regret and sublinear cumulative constraint violation under time-varying constraints, both with and without predictions of future loss and constraint functions. Without predictions, we introduce an adaptive penalty approach that guarantees sublinear regret and constraint violation. When short-horizon and potentially unreliable predictions are available, we reinterpret the problem as online learning with delayed feedback and design an optimistic algorithm whose performance improves as prediction accuracy improves, while remaining robust when predictions are inaccurate. Our results bridge the gap between classical constrained online convex optimization and memory-dependent settings, and provide a versatile learning toolbox with diverse applications.