🤖 AI Summary
Offline diffusion policies are prone to solver errors, score-matching bias, and action inconsistency, particularly in data-scarce and continuous control settings. This work proposes Compressed Diffusion Policies (CDPs), which introduce compressed dynamics into the diffusion sampling process for the first time. By guiding nearby trajectories to converge, CDPs significantly enhance robustness to approximation errors and reduce action variance. The approach requires no modification to the backbone architecture, incurs minimal computational overhead, and maintains both generality and stability. Extensive experiments in simulation and real-world environments demonstrate that CDPs outperform existing methods across multiple offline reinforcement learning benchmarks, with especially pronounced performance gains under data scarcity.
📝 Abstract
Diffusion policies have emerged as powerful generative models for offline policy learning, whose sampling process can be rigorously characterized by a score function guiding a Stochastic Differential Equation (SDE). However, the same score-based SDE modeling that grants diffusion policies the flexibility to learn diverse behavior also incurs solver and score-matching errors, large data requirements, and inconsistencies in action generation. While less critical in image generation, these inaccuracies compound and lead to failure in continuous control settings. We introduce Contractive Diffusion Policies (CDPs) to induce contractive behavior in the diffusion sampling dynamics. Contraction pulls nearby flows closer to enhance robustness against solver and score-matching errors while reducing unwanted action variance. We develop an in-depth theoretical analysis along with a practical implementation recipe to incorporate CDPs into existing diffusion policy architectures with minimal modification and computational cost. We evaluate CDPs for offline learning by conducting extensive experiments in simulation and real-world settings. Across benchmarks, CDPs often outperform baseline policies, with pronounced benefits under data scarcity.