🤖 AI Summary
To address the limited robustness of deep reinforcement learning (DRL) in uncertain environments, this paper proposes the Distributionally Robust Soft Actor-Critic (DR-SAC) algorithm. DR-SAC is the first to integrate distributionally robust optimization (DRO) into the SAC framework, enabling robust soft policy iteration with theoretical convergence guarantees. It further introduces a generative modeling approach for data-driven estimation of the nominal transition distribution, facilitating offline robust learning under unknown dynamics. Evaluated on standard continuous-control benchmarks, DR-SAC achieves an average reward 9.8× higher than the SAC baseline under model mismatch and environmental perturbations. It significantly outperforms existing robust RL methods while incurring lower computational overhead and demonstrating strong scalability.
📝 Abstract
Deep reinforcement learning (RL) has achieved significant success, yet its application in real-world scenarios is often hindered by a lack of robustness to environmental uncertainties. To solve this challenge, some robust RL algorithms have been proposed, but most are limited to tabular settings. In this work, we propose Distributionally Robust Soft Actor-Critic (DR-SAC), a novel algorithm designed to enhance the robustness of the state-of-the-art Soft Actor-Critic (SAC) algorithm. DR-SAC aims to maximize the expected value with entropy against the worst possible transition model lying in an uncertainty set. A distributionally robust version of the soft policy iteration is derived with a convergence guarantee. For settings where nominal distributions are unknown, such as offline RL, a generative modeling approach is proposed to estimate the required nominal distributions from data. Furthermore, experimental results on a range of continuous control benchmark tasks demonstrate our algorithm achieves up to $9.8$ times the average reward of the SAC baseline under common perturbations. Additionally, compared with existing robust reinforcement learning algorithms, DR-SAC significantly improves computing efficiency and applicability to large-scale problems.