NFPO: Stabilized Policy Optimization of Normalizing Flow for Robotic Policy Learning

πŸ“… 2026-03-11
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the limitations of conventional deep reinforcement learning approaches that rely on multivariate Gaussian policies, which struggle to effectively model multimodal action distributions commonly encountered in robotic tasks. To overcome this, the paper presents the first stable integration of normalizing flows into online robotic policy learning, introducing a simple yet effective mechanism to stabilize trainingβ€”a longstanding challenge for normalizing flows in online reinforcement learning settings. By doing so, the method transcends the representational constraints of Gaussian policies and enables efficient learning of complex, multimodal behaviors. Experimental results demonstrate that the proposed approach achieves robust and superior performance across multiple simulated environments and successfully transfers to real-world robotic systems.

Technology Category

Application Category

πŸ“ Abstract
Deep Reinforcement Learning (DRL) has experienced significant advancements in recent years and has been widely used in many fields. In DRL-based robotic policy learning, however, current de facto policy parameterization is still multivariate Gaussian (with diagonal covariance matrix), which lacks the ability to model multi-modal distribution. In this work, we explore the adoption of a modern network architecture, i.e. Normalizing Flow (NF) as the policy parameterization for its ability of multi-modal modeling, closed form of log probability and low computation and memory overhead. However, naively training NF in online Reinforcement Learning (RL) usually leads to training instability. We provide a detailed analysis for this phenomenon and successfully address it via simple but effective technique. With extensive experiments in multiple simulation environments, we show our method, NFPO could obtain robust and strong performance in widely used robotic learning tasks and successfully transfer into real-world robots.
Problem

Research questions and friction points this paper is trying to address.

Normalizing Flow
Policy Optimization
Multi-modal Distribution
Training Instability
Robotic Policy Learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Normalizing Flow
Policy Optimization
Multi-modal Policy
Reinforcement Learning
Robotic Learning
πŸ”Ž Similar Papers
No similar papers found.