🤖 AI Summary
In online structured prediction under non-stationary environments, standard surrogate regret bounds fail due to distributional shifts over time.
Method: We propose the first theoretical framework integrating dynamic regret analysis with surrogate gap techniques. Our approach jointly characterizes dynamic regret via comparator sequence path length and cumulative surrogate loss, yielding a tight bound. We introduce the first Polyak-type adaptive learning rate with provable guarantees and extend the Fenchel–Young loss via convolution to accommodate general structured output spaces.
Contribution/Results: We establish an optimal dependence in the derived dynamic regret bound—optimal in both path length and surrogate loss terms. Empirical evaluation demonstrates that our method significantly reduces cumulative target loss across diverse non-stationary tasks, consistently outperforming state-of-the-art online classification and structured prediction baselines.
📝 Abstract
Online structured prediction, including online classification as a special case, is the task of sequentially predicting labels from input features. Therein the surrogate regret -- the cumulative excess of the target loss (e.g., 0-1 loss) over the surrogate loss (e.g., logistic loss) of the fixed best estimator -- has gained attention, particularly because it often admits a finite bound independent of the time horizon $T$. However, such guarantees break down in non-stationary environments, where every fixed estimator may incur the surrogate loss growing linearly with $T$. We address this by proving a bound of the form $F_T + C(1 + P_T)$ on the cumulative target loss, where $F_T$ is the cumulative surrogate loss of any comparator sequence, $P_T$ is its path length, and $C > 0$ is some constant. This bound depends on $T$ only through $F_T$ and $P_T$, often yielding much stronger guarantees in non-stationary environments. Our core idea is to synthesize the dynamic regret bound of the online gradient descent (OGD) with the technique of exploiting the surrogate gap. Our analysis also sheds light on a new Polyak-style learning rate for OGD, which systematically offers target-loss guarantees and exhibits promising empirical performance. We further extend our approach to a broader class of problems via the convolutional Fenchel--Young loss. Finally, we prove a lower bound showing that the dependence on $F_T$ and $P_T$ is tight.