🤖 AI Summary
This work addresses the challenge of maintaining global trajectory consistency in long-horizon planning tasks, where diffusion models often struggle due to their reliance on local denoising steps. To overcome this limitation, the authors propose the eXtrinsic search-guided Diffuser (XDiffuser) framework, which introduces an extrinsic graph-based search mechanism for the first time. Specifically, a lightweight path planner operates on a state-space graph to generate a coarse trajectory, which then guides the diffusion model to produce a complete trajectory in a single denoising pass. This design shifts the exploration burden from the computationally intensive diffusion process to an efficient graph search, significantly improving both planning efficiency and generalization. Empirical results demonstrate that XDiffuser outperforms existing diffusion-based baselines in multi-agent coordination and TSP-like reasoning tasks, with particularly strong performance under low-quality training data and unseen compositional scenarios.
📝 Abstract
Compositional diffusion models offer a promising route to long-horizon planning by denoising multiple overlapping sub-trajectories while ensuring that together they constitute a global solution. However, enforcing local behavior over long chains is often insufficient for a coherent global structure to emerge. Recent works tackle this limitation through intrinsic search, which explores multiple paths during the denoising process. While intrinsic search improves global coherence, it comes at the cost of repeated evaluations of an already compute-heavy model. In this work, we argue that extrinsic search, performed outside the denoising process, offers a more effective mode of exploration for long-horizon planning while naturally enabling the use of classical algorithms to solve unseen combinatorial tasks at test time. Our eXtrinsic search-guided Diffuser (XDiffuser) first computes a plan over a state-space graph -- serving as a lightweight local connectivity oracle for the diffusion model. The plan is then used to guide denoising for a single trajectory, effectively offloading the burden of exploration. XDiffuser outperforms diffusion-based baselines on long-horizon tasks, with particularly large gains in the low-quality data regime and on unseen tasks beyond goal-reaching, including multi-agent coordination and TSP-style reasoning. Project website: https://yanivhass.github.io/XDiffuser-site/