🤖 AI Summary
In safety-critical reinforcement learning, agents face unknown environments and must dynamically adapt risk preferences—challenging conventional static or manually tuned risk policies.
Method: We propose a risk-aware decision-making framework that jointly models epistemic and aleatoric uncertainty online. Our approach integrates total variation minimization with Follow-The-Leader online optimization to enable provably convergent, adaptive risk-level selection. It unifies distributional RL, uncertainty quantification, online convex optimization, and satisficing loss design, augmented by total variation regularization.
Contribution/Results: Unlike fixed-risk baselines or hand-tuned adaptive methods, our framework autonomously adjusts risk sensitivity in response to environmental feedback. Experiments across diverse tasks demonstrate significant improvements in both policy robustness and sample efficiency, while ensuring theoretical convergence guarantees for dynamic risk calibration.
📝 Abstract
One of the main challenges in reinforcement learning (RL) is that the agent has to make decisions that would influence the future performance without having complete knowledge of the environment. Dynamically adjusting the level of epistemic risk during the learning process can help to achieve reliable policies in safety-critical settings with better efficiency. In this work, we propose a new framework, Distributional RL with Online Risk Adaptation (DRL-ORA). This framework quantifies both epistemic and implicit aleatory uncertainties in a unified manner and dynamically adjusts the epistemic risk levels by solving a total variation minimization problem online. The selection of risk levels is performed efficiently via a grid search using a Follow-The-Leader-type algorithm, where the offline oracle corresponds to a"satisficing measure"under a specially modified loss function. We show that DRL-ORA outperforms existing methods that rely on fixed risk levels or manually designed risk level adaptation in multiple classes of tasks.