🤖 AI Summary
Existing compositional diffusion-based planning methods often suffer from mode averaging when the local plan distribution is multimodal, leading to locally infeasible and globally inconsistent trajectories. This work proposes RCD, a training-free guidance approach that, for the first time, leverages the self-reconstruction error of a pretrained diffusion model as a proxy for the log-density of compositional plans. By incorporating an overlap-segment consistency constraint and employing score-guided sampling, RCD generates high-density, globally coherent long-horizon trajectories. Evaluated on multiple long-horizon tasks in OGBench—including motion control, object manipulation, and pixel-based observation—RCD significantly outperforms existing methods, effectively mitigating mode averaging in multimodal settings and producing more reliable and consistent planning trajectories.
📝 Abstract
Compositional diffusion planning generates long-horizon trajectories by stitching together overlapping short-horizon segments through score composition. However, when local plan distributions are multimodal, existing compositional methods suffer from mode-averaging, where averaging incompatible local modes leads to plans that are neither locally feasible nor globally coherent. We propose Refining Compositional Diffusion (RCD), a training-free guidance method that steers compositional sampling toward high-density, globally coherent plans. RCD leverages the self-reconstruction error of a pretrained diffusion model as a proxy for the log-density of composed plans, combined with an overlap consistency term that enforces consistency at segment boundaries. We show that the combined guidance concentrates sampling on high-density plans that mitigate mode-averaging. Experiments on challenging long-horizon tasks from OGBench, including locomotion, object manipulation, and pixel-based observations, demonstrate that RCD consistently outperforms existing methods.