🤖 AI Summary
Conditional diffusion models suffer severe performance degradation under highly corrupted conditioning inputs (e.g., label noise, observation distortions), and existing robust methods fail under high-noise regimes. To address this, we propose a robust learning framework centered on a progressive optimization mechanism that jointly leverages pseudo-condition learning and temporal ensembling to dynamically estimate and rectify polluted conditions. Furthermore, we introduce Reverse-time Diffusion Conditioning (RDC), a novel technique that enhances memory consistency across diffusion steps, enabling effective correction of condition signals even under extreme noise—previously unattainable. Our framework unifies the modeling of conditional uncertainty across diffusion timesteps. Evaluated on class-conditional image generation and vision-based motor policy learning, it achieves state-of-the-art performance across multiple noise levels, significantly outperforming prior robust diffusion approaches.
📝 Abstract
Conditional diffusion models have the generative controllability by incorporating external conditions. However, their performance significantly degrades with noisy conditions, such as corrupted labels in the image generation or unreliable observations or states in the control policy generation. This paper introduces a robust learning framework to address extremely noisy conditions in conditional diffusion models. We empirically demonstrate that existing noise-robust methods fail when the noise level is high. To overcome this, we propose learning pseudo conditions as surrogates for clean conditions and refining pseudo ones progressively via the technique of temporal ensembling. Additionally, we develop a Reverse-time Diffusion Condition (RDC) technique, which diffuses pseudo conditions to reinforce the memorization effect and further facilitate the refinement of the pseudo conditions. Experimentally, our approach achieves state-of-the-art performance across a range of noise levels on both class-conditional image generation and visuomotor policy generation tasks.The code can be accessible via the project page https://robustdiffusionpolicy.github.io