🤖 AI Summary
This paper studies multi-objective online convex optimization (MOCO): given $K$ independent loss function sequences, the algorithm must select a single action per round without observing the current losses of any sequence. Departing from standard single-objective settings, we propose a min-max regret criterion—measuring the worst-case performance gap between the online policy and the static optimal action that minimizes the maximum total loss across all $K$ sequences. We design an efficient algorithm by innovatively integrating the Hedge algorithm with online gradient descent (OGD), under an i.i.d. input assumption. We prove that its expected min-max regret is bounded by $O(sqrt{T log K})$, achieving for the first time a logarithmic dependence on the number of objectives $K$. This bound matches the fundamental lower bound for this setting, establishing optimal convergence rate.
📝 Abstract
In online convex optimization (OCO), a single loss function sequence is revealed over a time horizon of $T$, and an online algorithm has to choose its action at time $t$, before the loss function at time $t$ is revealed. The goal of the online algorithm is to incur minimal penalty (called $ extit{regret}$ compared to a static optimal action made by an optimal offline algorithm knowing all functions of the sequence in advance.
In this paper, we broaden the horizon of OCO, and consider multi-objective OCO, where there are $K$ distinct loss function sequences, and an algorithm has to choose its action at time $t$, before the $K$ loss functions at time $t$ are revealed. To capture the tradeoff between tracking the $K$ different sequences, we consider the $ extit{min-max}$ regret, where the benchmark (optimal offline algorithm) takes a static action across all time slots that minimizes the maximum of the total loss (summed across time slots) incurred by each of the $K$ sequences. An online algorithm is allowed to change its action across time slots, and its {it min-max} regret is defined as the difference between its $ extit{min-max}$ cost and that of the benchmark. The $ extit{min-max}$ regret is a stringent performance measure and an algorithm with small regret needs to `track' all loss function sequences closely at all times.
We consider this $ extit{min-max}$ regret in the i.i.d. input setting where all loss functions are i.i.d. generated from an unknown distribution. For the i.i.d. model we propose a simple algorithm that combines the well-known $ extit{Hedge}$ and online gradient descent (OGD) and show via a remarkably simple proof that its expected $ extit{min-max}$ regret is $O(sqrt{T log K})$.