🤖 AI Summary
This work addresses the performance degradation caused by feedback delays in online convex optimization by introducing a continuous-time reduction framework that adaptively extends any online linear optimization algorithm to handle round-dependent delays. By decomposing regret into a delay-independent learning term and a delay-induced drift term, the proposed approach achieves—for the first time in bandit convex optimization—a delay-dependent regret bound scaling only with the square root of total delay, i.e., $O(\sqrt{d_{\text{tot}}})$, which significantly improves upon the previous best-known bound of $O(\min\{\sqrt{T d_{\max}}, (T d_{\text{tot}})^{1/3}\})$. In the strongly convex setting, the analysis further incorporates a refined delay measure $\sigma_{\max}$, reducing the delay-related regret to $O(\min\{\sigma_{\max} \ln T, \sqrt{d_{\text{tot}}}\})$.
📝 Abstract
We develop a reduction-based framework for online learning with delayed feedback that recovers and improves upon existing results for both first-order and bandit convex optimization. Our approach introduces a continuous-time model under which regret decomposes into a delay-independent learning term and a delay-induced drift term, yielding a delay-adaptive reduction that converts any algorithm for online linear optimization into one that handles round-dependent delays. For bandit convex optimization, we significantly improve existing regret bounds, with delay-dependent terms matching state-of-the-art first-order rates. For first-order feedback, we recover state-of-the-art regret bounds via a simpler, unified analysis. Quantitatively, for bandit convex optimization we obtain $O(\sqrt{d_{\text{tot}}} + T^{\frac{3}{4}}\sqrt{k})$ regret, improving the delay-dependent term from $O(\min\{\sqrt{T d_{\text{max}}},(Td_{\text{tot}})^{\frac{1}{3}}\})$ in previous work to $O(\sqrt{d_{\text{tot}}})$. Here, $k$, $T$, $d_{\text{max}}$, and $d_{\text{tot}}$ denote the dimension, time horizon, maximum delay, and total delay, respectively. Under strong convexity, we achieve $O(\min\{\sigma_{\text{max}} \ln T, \sqrt{d_{\text{tot}}}\} + (T^2\ln T)^{\frac{1}{3}} {k}^{\frac{2}{3}})$, improving the delay-dependent term from $O(d_{\text{max}} \ln T)$ in previous work to $O(\min\{\sigma_{\text{max}} \ln T, \sqrt{d_{\text{tot}}}\})$, where $\sigma_{\text{max}}$ denotes the maximum number of outstanding observations and may be considerably smaller than $d_{\text{max}}$.