DR-SAC: Distributionally Robust Soft Actor-Critic for Reinforcement Learning under Uncertainty

📅 2025-06-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limited robustness of deep reinforcement learning (DRL) in uncertain environments, this paper proposes the Distributionally Robust Soft Actor-Critic (DR-SAC) algorithm. DR-SAC is the first to integrate distributionally robust optimization (DRO) into the SAC framework, enabling robust soft policy iteration with theoretical convergence guarantees. It further introduces a generative modeling approach for data-driven estimation of the nominal transition distribution, facilitating offline robust learning under unknown dynamics. Evaluated on standard continuous-control benchmarks, DR-SAC achieves an average reward 9.8× higher than the SAC baseline under model mismatch and environmental perturbations. It significantly outperforms existing robust RL methods while incurring lower computational overhead and demonstrating strong scalability.

Technology Category

Application Category

📝 Abstract
Deep reinforcement learning (RL) has achieved significant success, yet its application in real-world scenarios is often hindered by a lack of robustness to environmental uncertainties. To solve this challenge, some robust RL algorithms have been proposed, but most are limited to tabular settings. In this work, we propose Distributionally Robust Soft Actor-Critic (DR-SAC), a novel algorithm designed to enhance the robustness of the state-of-the-art Soft Actor-Critic (SAC) algorithm. DR-SAC aims to maximize the expected value with entropy against the worst possible transition model lying in an uncertainty set. A distributionally robust version of the soft policy iteration is derived with a convergence guarantee. For settings where nominal distributions are unknown, such as offline RL, a generative modeling approach is proposed to estimate the required nominal distributions from data. Furthermore, experimental results on a range of continuous control benchmark tasks demonstrate our algorithm achieves up to $9.8$ times the average reward of the SAC baseline under common perturbations. Additionally, compared with existing robust reinforcement learning algorithms, DR-SAC significantly improves computing efficiency and applicability to large-scale problems.
Problem

Research questions and friction points this paper is trying to address.

Enhance RL robustness under environmental uncertainties
Maximize expected value against worst-case transition models
Improve computing efficiency for large-scale RL problems
Innovation

Methods, ideas, or system contributions that make the work stand out.

DR-SAC enhances robustness via distributionally robust optimization
Generative modeling estimates nominal distributions for offline RL
Improves computing efficiency for large-scale robust RL
🔎 Similar Papers
No similar papers found.