🤖 AI Summary
This paper addresses real-time model tracking under time-varying optimization in online learning from streaming data. Unlike conventional static optimization, we propose a structured weight-based modeling framework that explicitly formulates the objective as a weighted average of historical losses. We systematically analyze two weighting schemes: uniform and geometrically discounted. Theoretically, we derive the first tight tracking error bounds: under uniform weighting, the error converges to zero at rate $O(1/t)$; under geometric discounting, it converges to a controllable nonzero steady-state error. Algorithmically, we integrate gradient-based updates to enable efficient online adaptation. Numerical experiments validate both the theoretical bounds and the empirical effectiveness of the proposed approach.
📝 Abstract
Classical optimization theory deals with fixed, time-invariant objective functions. However, time-varying optimization has emerged as an important subject for decision-making in dynamic environments. In this work, we study the problem of learning from streaming data through a time-varying optimization lens. Unlike prior works that focus on generic formulations, we introduce a structured, emph{weight-based} formulation that explicitly captures the streaming-data origin of the time-varying objective, where at each time step, an agent aims to minimize a weighted average loss over all the past data samples. We focus on two specific weighting strategies: (1) uniform weights, which treat all samples equally, and (2) discounted weights, which geometrically decay the influence of older data. For both schemes, we derive tight bounds on the ``tracking error'' (TE), defined as the deviation between the model parameter and the time-varying optimum at a given time step, under gradient descent (GD) updates. We show that under uniform weighting, the TE vanishes asymptotically with a $mathcal{O}(1/t)$ decay rate, whereas discounted weighting incurs a nonzero error floor controlled by the discount factor and the number of gradient updates performed at each time step. Our theoretical findings are validated through numerical simulations.