🤖 AI Summary
This work addresses the challenge posed by time-varying system dynamics—such as those caused by wear or operating condition drift—that render conventional reinforcement learning methods ineffective. To tackle this issue, the paper proposes a model-based reinforcement learning approach tailored for non-stationary environments. The method integrates a Gaussian process dynamics model with an adaptive data buffering mechanism that explicitly limits the influence of outdated experiences, thereby enabling well-calibrated uncertainty estimation. Notably, it is the first to incorporate variable-budget non-stationarity analysis into a model-based control framework, offering dynamic regret guarantees. Empirical evaluations on multiple time-varying continuous control benchmarks demonstrate that the proposed algorithm significantly outperforms existing approaches, confirming its robustness and efficiency in non-stationary settings.
📝 Abstract
Learning-based control methods typically assume stationary system dynamics, an assumption often violated in real-world systems due to drift, wear, or changing operating conditions. We study reinforcement learning for control under time-varying dynamics. We consider a continual model-based reinforcement learning setting in which an agent repeatedly learns and controls a dynamical system whose transition dynamics evolve across episodes. We analyze the problem using Gaussian process dynamics models under frequentist variation-budget assumptions. Our analysis shows that persistent non-stationarity requires explicitly limiting the influence of outdated data to maintain calibrated uncertainty and meaningful dynamic regret guarantees. Motivated by these insights, we propose a practical optimistic model-based reinforcement learning algorithm with adaptive data buffer mechanisms and demonstrate improved performance on continuous control benchmarks with non-stationary dynamics.