How Does the Lagrangian Guide Safe Reinforcement Learning through Diffusion Models?

πŸ“… 2026-02-02
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work proposes Augmented Lagrangian-Guided Diffusion (ALGD), a novel algorithm that integrates the augmented Lagrangian method with diffusion-based policy learning to address the challenge of ensuring online safety in reinforcement learning. While existing diffusion-based approaches primarily focus on offline reward maximization and often neglect safety constraints during deployment, ALGD leverages the Lagrangian function as an energy landscape to guide the denoising process. By incorporating augmented terms that locally convexify the otherwise non-convex energy surface, the method stabilizes the generation and training of safe policies without altering the optimal policy distribution. Grounded in energy-based modeling, duality theory, and non-convex optimization, ALGD demonstrates theoretically principled and empirically effective performance across diverse environments, achieving a robust balance between safety and optimality.

Technology Category

Application Category

πŸ“ Abstract
Diffusion policy sampling enables reinforcement learning (RL) to represent multimodal action distributions beyond suboptimal unimodal Gaussian policies. However, existing diffusion-based RL methods primarily focus on offline settings for reward maximization, with limited consideration of safety in online settings. To address this gap, we propose Augmented Lagrangian-Guided Diffusion (ALGD), a novel algorithm for off-policy safe RL. By revisiting optimization theory and energy-based model, we show that the instability of primal-dual methods arises from the non-convex Lagrangian landscape. In diffusion-based safe RL, the Lagrangian can be interpreted as an energy function guiding the denoising dynamics. Counterintuitively, direct usage destabilizes both policy generation and training. ALGD resolves this issue by introducing an augmented Lagrangian that locally convexifies the energy landscape, yielding a stabilized policy generation and training process without altering the distribution of the optimal policy. Theoretical analysis and extensive experiments demonstrate that ALGD is both theoretically grounded and empirically effective, achieving strong and stable performance across diverse environments.
Problem

Research questions and friction points this paper is trying to address.

safe reinforcement learning
diffusion models
Lagrangian
online settings
policy stability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion Policy
Safe Reinforcement Learning
Augmented Lagrangian
Energy-Based Model
Primal-Dual Optimization
πŸ”Ž Similar Papers
No similar papers found.
X
Xiaoyuan Cheng
University College London
W
Wenxuan Yuan
Imperial College London
B
Boyang Li
University of California, San Diego
Yuanchao Xu
Yuanchao Xu
Kyoto University
data driven dynamical system
Y
Yiming Yang
University College London
H
Hao Liang
King’s College London
Bei Peng
Bei Peng
Lecturer (Assistant Professor), University of Sheffield
Machine LearningReinforcement LearningInteractive LearningMulti-Agent Systems
R
R. Loftin
University of Sheffield
Zhuo Sun
Zhuo Sun
Australian National University
Wireless Comunications
Y
Yukun Hu
University College London