🤖 AI Summary
Diffusion policies have demonstrated strong performance in visuomotor control but are prone to failure under severe out-of-distribution (OOD) perturbations, such as object displacement or visual corruption. This work proposes Dream Diffusion Policy (DDP), which jointly optimizes a diffusion world model and a diffusion policy through a shared 3D vision encoder, endowing the policy with robust state prediction capabilities. During inference, DDP leverages autoregressive latent dynamics to perform “imagined” decision-making. It further incorporates a reality–imagination discrepancy detection mechanism that, under OOD conditions, actively disregards corrupted visual inputs and relies instead on internal predictions. Experiments show that DDP achieves a 73.8% success rate under OOD settings in MetaWorld—substantially outperforming the baseline at 23.9%—and reaches 83.3% in real-world scenarios with severe spatial shifts, compared to the baseline’s 3.3%. Remarkably, DDP maintains a 76.7% success rate even in fully open-loop execution.
📝 Abstract
Diffusion policies excel at visuomotor control but often fail catastrophically under severe out-of-distribution (OOD) disturbances, such as unexpected object displacements or visual corruptions. To address this vulnerability, we introduce the Dream Diffusion Policy (DDP), a framework that deeply integrates a diffusion world model into the policy's training objective via a shared 3D visual encoder. This co-optimization endows the policy with robust state-prediction capabilities. When encountering sudden OOD anomalies during inference, DDP detects the real-imagination discrepancy and actively abandons the corrupted visual stream. Instead, it relies on its internal "imagination" (autoregressively forecasted latent dynamics) to safely bypass the disruption, generating imagined trajectories before smoothly realigning with physical reality. Extensive evaluations demonstrate DDP's exceptional resilience. Notably, DDP achieves a 73.8% OOD success rate on MetaWorld (vs. 23.9% without predictive imagination) and an 83.3% success rate under severe real-world spatial shifts (vs. 3.3% without predictive imagination). Furthermore, as a stress test, DDP maintains a 76.7% real-world success rate even when relying entirely on open-loop imagination post-initialization.