A Review of Online Diffusion Policy RL Algorithms for Scalable Robotic Control

📅 2026-01-05

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

219K/year

🤖 AI Summary

This work addresses the challenge in online diffusion policy reinforcement learning (Online DPRL) where training objectives and policy improvement mechanisms are often misaligned, hindering scalable robotic control. We propose the first taxonomy of Online DPRL algorithms based on their policy improvement mechanisms, categorizing them into four classes: Action-Gradient, Q-Weighting, Proximity-Based, and BPTT. Leveraging the NVIDIA Isaac Lab platform, we conduct a unified benchmark across twelve diverse robotic tasks and systematically evaluate these methods along five critical dimensions: task diversity, parallelizability, diffusion-step scalability, cross-embodiment generalization, and environmental robustness. Our analysis reveals fundamental trade-offs between sample efficiency and scalability, identifies key bottlenecks limiting real-world deployment, and provides practical guidance for algorithm selection while highlighting promising directions for future research.

Technology Category

Application Category

📝 Abstract

Diffusion policies have emerged as a powerful approach for robotic control, demonstrating superior expressiveness in modeling multimodal action distributions compared to conventional policy networks. However, their integration with online reinforcement learning remains challenging due to fundamental incompatibilities between diffusion model training objectives and standard RL policy improvement mechanisms. This paper presents the first comprehensive review and empirical analysis of current Online Diffusion Policy Reinforcement Learning (Online DPRL) algorithms for scalable robotic control systems. We propose a novel taxonomy that categorizes existing approaches into four distinct families--Action-Gradient, Q-Weighting, Proximity-Based, and Backpropagation Through Time (BPTT) methods--based on their policy improvement mechanisms. Through extensive experiments on a unified NVIDIA Isaac Lab benchmark encompassing 12 diverse robotic tasks, we systematically evaluate representative algorithms across five critical dimensions: task diversity, parallelization capability, diffusion step scalability, cross-embodiment generalization, and environmental robustness. Our analysis identifies key findings regarding the fundamental trade-offs inherent in each algorithmic family, particularly concerning sample efficiency and scalability. Furthermore, we reveal critical computational and algorithmic bottlenecks that currently limit the practical deployment of online DPRL. Based on these findings, we provide concrete guidelines for algorithm selection tailored to specific operational constraints and outline promising future research directions to advance the field toward more general and scalable robotic learning systems.

Problem

Research questions and friction points this paper is trying to address.

diffusion policy

online reinforcement learning

robotic control

scalability

policy improvement

Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion Policy

Online Reinforcement Learning

Robotic Control