๐ค AI Summary
Existing continual reinforcement learning (CRL) frameworks are predominantly risk-neutral, optimizing only the expected long-term return while neglecting tail-risk exposure and policy stability. Method: This work systematically introduces risk measurement theory into CRL, identifying a fundamental incompatibility between classical static risk measuresโsuch as Conditional Value-at-Risk (CVaR)โand the non-stationary, task-evolving nature of continual learning. To address this, we propose Ergodic Risk Measures, grounded in ergodic process theory, to characterize long-term risk exposure under dynamic task sequences. We establish their theoretical consistency and stability, and develop a differentiable, risk-aware policy optimization algorithm. Contribution/Results: Experiments on multiple CRL benchmarks demonstrate significant improvements in policy robustness and long-term performance stability. Our approach provides the first principled theoretical foundation and practical paradigm for risk-aware continual reinforcement learning.
๐ Abstract
Continual reinforcement learning (continual RL) seeks to formalize the notions of lifelong learning and endless adaptation in RL. In particular, the aim of continual RL is to develop RL agents that can maintain a careful balance between retaining useful information and adapting to new situations. To date, continual RL has been explored almost exclusively through the lens of risk-neutral decision-making, in which the agent aims to optimize the expected (or mean) long-run performance. In this work, we present the first formal theoretical treatment of continual RL through the lens of risk-aware decision-making, in which the agent aims to optimize a reward-based measure of long-run performance beyond the mean. In particular, we show that the classical theory of risk measures, widely used as a theoretical foundation in non-continual risk-aware RL, is, in its current form, incompatible with the continual setting. Then, building on this insight, we extend risk measure theory into the continual setting by introducing a new class of ergodic risk measures that are compatible with continual learning. Finally, we provide a case study of risk-aware continual learning, along with empirical results, which show the intuitive appeal and theoretical soundness of ergodic risk measures.