D2PPO: Diffusion Policy Policy Optimization with Dispersive Loss

📅 2025-08-04

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

Diffusion policies for robotic manipulation suffer from representation collapse, wherein semantically similar observations are mapped to indistinguishable latent features, hindering discrimination of task-critical fine-grained differences. To address this, we propose D2PPO—a novel diffusion policy framework introducing hierarchical dispersion loss regularization: within each batch, all latent representations serve as negative samples, with layer-wise differentiated weighting applied to early versus late network layers to enhance feature discriminability. Inspired by contrastive learning principles, D2PPO requires no additional data or annotations. On the RoboMimic benchmark, D2PPO achieves new state-of-the-art performance, improving pre-training and fine-tuning success rates by 22.7% and 26.1%, respectively. Furthermore, real-robot experiments on complex manipulation tasks demonstrate superior success rates over prior methods.

Technology Category

Application Category

📝 Abstract

Diffusion policies excel at robotic manipulation by naturally modeling multimodal action distributions in high-dimensional spaces. Nevertheless, diffusion policies suffer from diffusion representation collapse: semantically similar observations are mapped to indistinguishable features, ultimately impairing their ability to handle subtle but critical variations required for complex robotic manipulation. To address this problem, we propose D2PPO (Diffusion Policy Policy Optimization with Dispersive Loss). D2PPO introduces dispersive loss regularization that combats representation collapse by treating all hidden representations within each batch as negative pairs. D2PPO compels the network to learn discriminative representations of similar observations, thereby enabling the policy to identify subtle yet crucial differences necessary for precise manipulation. In evaluation, we find that early-layer regularization benefits simple tasks, while late-layer regularization sharply enhances performance on complex manipulation tasks. On RoboMimic benchmarks, D2PPO achieves an average improvement of 22.7% in pre-training and 26.1% after fine-tuning, setting new SOTA results. In comparison with SOTA, results of real-world experiments on a Franka Emika Panda robot show the excitingly high success rate of our method. The superiority of our method is especially evident in complex tasks. Project page: https://guowei-zou.github.io/d2ppo/

Problem

Research questions and friction points this paper is trying to address.

Addresses diffusion representation collapse in robotic manipulation policies

Enhances discriminative feature learning for subtle observation differences

Improves performance on complex tasks via dispersive loss regularization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dispersive loss prevents diffusion representation collapse

Treats hidden representations as negative pairs

Enhances discriminative feature learning for manipulation

🔎 Similar Papers

No similar papers found.