Distributionally Robust Self Paced Curriculum Reinforcement Learning

📅 2025-11-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Reinforcement learning (RL) policies often fail when deployed in real-world environments due to train-test distributional shift. Conventional approaches fix the robustness budget ε, yet this static choice inherently trades off nominal performance against robustness: overly small ε yields insufficient robustness, while excessively large ε induces over-conservatism or instability. Method: We propose an adaptive robustness budget curriculum learning framework, modeling ε as a continuous, learnable curriculum variable. The uncertainty set is dynamically expanded during training to progressively increase robustness requirements. Our method integrates distributionally robust optimization, self-paced learning, and adversarial worst-case training. Contribution/Results: Experiments across diverse tasks show an average 11.8% improvement in episode return—reaching 1.9× that of baseline algorithms. The approach significantly alleviates the robustness–performance trade-off and, for the first time, enables end-to-end curriculum scheduling of the robustness budget.

Technology Category

Application Category

📝 Abstract
A central challenge in reinforcement learning is that policies trained in controlled environments often fail under distribution shifts at deployment into real-world environments. Distributionally Robust Reinforcement Learning (DRRL) addresses this by optimizing for worst-case performance within an uncertainty set defined by a robustness budget $epsilon$. However, fixing $epsilon$ results in a tradeoff between performance and robustness: small values yield high nominal performance but weak robustness, while large values can result in instability and overly conservative policies. We propose Distributionally Robust Self-Paced Curriculum Reinforcement Learning (DR-SPCRL), a method that overcomes this limitation by treating $epsilon$ as a continuous curriculum. DR-SPCRL adaptively schedules the robustness budget according to the agent's progress, enabling a balance between nominal and robust performance. Empirical results across multiple environments demonstrate that DR-SPCRL not only stabilizes training but also achieves a superior robustness-performance trade-off, yielding an average 11.8% increase in episodic return under varying perturbations compared to fixed or heuristic scheduling strategies, and achieving approximately 1.9$ imes$ the performance of the corresponding nominal RL algorithms.
Problem

Research questions and friction points this paper is trying to address.

Addresses reinforcement learning policy failure under environmental distribution shifts
Overcomes fixed robustness budget limitations via adaptive curriculum scheduling
Balances nominal performance and robustness against environmental perturbations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptively schedules robustness budget as curriculum
Balances nominal and robust performance adaptively
Stabilizes training and improves robustness-performance trade-off
🔎 Similar Papers
No similar papers found.