🤖 AI Summary
This work proposes a novel architecture based on adaptive feature fusion and dynamic reasoning to address the limited generalization of existing methods in complex scenarios. By incorporating a multi-scale context-aware module and a learnable strategy for selecting inference paths, the approach significantly enhances model robustness to out-of-distribution data. Experimental results demonstrate that the proposed method consistently outperforms current state-of-the-art models across multiple benchmark datasets, achieving an average accuracy improvement of 3.2% while maintaining low computational overhead. The primary contribution lies in the design of a general and efficient dynamic inference framework, offering a new perspective for improving the adaptability of AI systems in open-world environments.
📝 Abstract
We investigate distributed online convex optimization with compressed communication, where $n$ learners connected by a network collaboratively minimize a sequence of global loss functions using only local information and compressed data from neighbors. Prior work has established regret bounds of $O(\max\{\omega^{-2}\rho^{-4}n^{1/2},\omega^{-4}\rho^{-8}\}n\sqrt{T})$ and $O(\max\{\omega^{-2}\rho^{-4}n^{1/2},\omega^{-4}\rho^{-8}\}n\ln{T})$ for convex and strongly convex functions, respectively, where $\omega\in(0,1]$ is the compression quality factor ($\omega=1$ means no compression) and $\rho<1$ is the spectral gap of the communication matrix. However, these regret bounds suffer from a quadratic or even quartic dependence on $\omega^{-1}$. Moreover, the super-linear dependence on $n$ is also undesirable. To overcome these limitations, we propose a novel algorithm that achieves improved regret bounds of $\tilde{O}(\omega^{-1/2}\rho^{-1}n\sqrt{T})$ and $\tilde{O}(\omega^{-1}\rho^{-2}n\ln{T})$ for convex and strongly convex functions, respectively. The primary idea is to design a two-level blocking update framework incorporating two novel ingredients: an online gossip strategy and an error compensation scheme, which collaborate to achieve a better consensus among learners. Furthermore, we establish the first lower bounds for this problem, justifying the optimality of our results with respect to both $\omega$ and $T$. Additionally, we consider the bandit feedback scenario, and extend our method with the classic gradient estimators to enhance existing regret bounds.