Self-Imitated Diffusion Policy for Efficient and Robust Visual Navigation

📅 2026-01-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes Self-Imitation Diffusion Policy (SIDP), a novel approach that overcomes key limitations of conventional diffusion-based policies, which often rely on expert demonstrations and consequently inherit suboptimal trajectories, while also requiring computationally expensive generate-and-filter pipelines during inference. SIDP leverages a reward-guided self-imitation mechanism to selectively learn high-quality trajectories from its own rollouts, eliminating dependence on expert data and post-hoc filtering. Integrated with curriculum learning and goal-agnostic exploration, the method enhances sample efficiency and generalization. Extensive experiments demonstrate that SIDP significantly outperforms existing approaches in both simulation and real-world robotic platforms. Notably, it achieves inference latency of only 110 ms on a Jetson Orin Nano, representing a 2.5× speedup and enabling efficient real-time deployment.

Technology Category

Application Category

📝 Abstract
Diffusion policies (DP) have demonstrated significant potential in visual navigation by capturing diverse multi-modal trajectory distributions. However, standard imitation learning (IL), which most DP methods rely on for training, often inherits sub-optimality and redundancy from expert demonstrations, thereby necessitating a computationally intensive"generate-then-filter"pipeline that relies on auxiliary selectors during inference. To address these challenges, we propose Self-Imitated Diffusion Policy (SIDP), a novel framework that learns improved planning by selectively imitating a set of trajectories sampled from itself. Specifically, SIDP introduces a reward-guided self-imitation mechanism that encourages the policy to consistently produce high-quality trajectories efficiently, rather than outputs of inconsistent quality, thereby reducing reliance on extensive sampling and post-filtering. During training, we employ a reward-driven curriculum learning paradigm to mitigate inefficient data utility, and goal-agnostic exploration for trajectory augmentation to improve planning robustness. Extensive evaluations on a comprehensive simulation benchmark show that SIDP significantly outperforms previous methods, with real-world experiments confirming its effectiveness across multiple robotic platforms. On Jetson Orin Nano, SIDP delivers a 2.5$\times$ faster inference than the baseline NavDP, i.e., 110ms VS 273ms, enabling efficient real-time deployment.
Problem

Research questions and friction points this paper is trying to address.

visual navigation
diffusion policy
imitation learning
trajectory optimization
real-time deployment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-Imitation
Diffusion Policy
Reward-Guided Learning
Efficient Inference
Visual Navigation
🔎 Similar Papers
No similar papers found.
R
Runhua Zhang
Uni-Ubi, Zhejiang University
Junyi Hou
Junyi Hou
National University of Singapore
Large Language ModelFederated Learning
C
Changxu Cheng
Uni-Ubi
Q
Qiyi Chen
Uni-Ubi
T
Tao Wang
Uni-Ubi
W
Wuyue Zhao
Uni-Ubi