🤖 AI Summary
In non-stationary online optimization, existing methods require prior knowledge of the discount factor λ ∈ [λ_min, 1), limiting adaptability to continuously varying environments.
Method: We propose the first adaptive discounting algorithm that operates without knowing λ a priori. It integrates (1) the Discounted-Normal-Predictor (DNP) mechanism—enabling provably robust aggregation of expert predictions across multiple λ values—and (2) a parallelized framework combining Smooth Online Gradient Descent (SOGD) with heterogeneous predictor scheduling for real-time ensemble.
Contribution/Results: We establish a novel analytical framework for discounted regret, yielding a unified upper bound of O(√(log T / (1−λ))) for all λ ∈ [λ_min, 1). This eliminates reliance on prespecified λ, significantly enhancing robustness and practicality in dynamic settings while advancing theoretical understanding of adaptive discounting.
📝 Abstract
Reflecting the greater significance of recent history over the distant past in non-stationary environments, $lambda$-discounted regret has been introduced in online convex optimization (OCO) to gracefully forget past data as new information arrives. When the discount factor $lambda$ is given, online gradient descent with an appropriate step size achieves an $O(1/sqrt{1-lambda})$ discounted regret. However, the value of $lambda$ is often not predetermined in real-world scenarios. This gives rise to a significant open question: is it possible to develop a discounted algorithm that adapts to an unknown discount factor. In this paper, we affirmatively answer this question by providing a novel analysis to demonstrate that smoothed OGD (SOGD) achieves a uniform $O(sqrt{log T/1-lambda})$ discounted regret, holding for all values of $lambda$ across a continuous interval simultaneously. The basic idea is to maintain multiple OGD instances to handle different discount factors, and aggregate their outputs sequentially by an online prediction algorithm named as Discounted-Normal-Predictor (DNP) (Kapralov and Panigrahy,2010). Our analysis reveals that DNP can combine the decisions of two experts, even when they operate on discounted regret with different discount factors.