Non-differentiable Reward Optimization for Diffusion-based Autonomous Motion Planning

📅 2025-07-17

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

Diffusion models can capture multimodal motion trajectories but suffer from a likelihood-based training objective that cannot directly optimize non-differentiable downstream metrics—e.g., safety and goal-reaching rate. To address this, we propose DiffRL, a novel framework integrating diffusion modeling with reinforcement learning. DiffRL introduces a reward-weighted dynamic thresholding algorithm to construct dense, differentiable proxy reward signals, enabling end-to-end optimization of diffusion policies with respect to non-differentiable safety and efficacy objectives for the first time. Crucially, our method requires no architectural modifications to the diffusion model and is fully compatible with arbitrary pre-trained diffusion policies. Evaluated on CrowdNav and ETH-UCY benchmarks, DiffRL significantly outperforms differentiable-loss-based baselines and achieves state-of-the-art performance, demonstrating its ability to jointly enhance decision quality and planning safety in complex interactive navigation scenarios.

Technology Category

Application Category

📝 Abstract

Safe and effective motion planning is crucial for autonomous robots. Diffusion models excel at capturing complex agent interactions, a fundamental aspect of decision-making in dynamic environments. Recent studies have successfully applied diffusion models to motion planning, demonstrating their competence in handling complex scenarios and accurately predicting multi-modal future trajectories. Despite their effectiveness, diffusion models have limitations in training objectives, as they approximate data distributions rather than explicitly capturing the underlying decision-making dynamics. However, the crux of motion planning lies in non-differentiable downstream objectives, such as safety (collision avoidance) and effectiveness (goal-reaching), which conventional learning algorithms cannot directly optimize. In this paper, we propose a reinforcement learning-based training scheme for diffusion motion planning models, enabling them to effectively learn non-differentiable objectives that explicitly measure safety and effectiveness. Specifically, we introduce a reward-weighted dynamic thresholding algorithm to shape a dense reward signal, facilitating more effective training and outperforming models trained with differentiable objectives. State-of-the-art performance on pedestrian datasets (CrowdNav, ETH-UCY) compared to various baselines demonstrates the versatility of our approach for safe and effective motion planning.

Problem

Research questions and friction points this paper is trying to address.

Optimizing non-differentiable rewards in diffusion-based motion planning

Enhancing safety and effectiveness in autonomous robot navigation

Improving training for collision avoidance and goal-reaching objectives

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning for diffusion motion planning

Reward-weighted dynamic thresholding algorithm

Optimizes non-differentiable safety and effectiveness objectives

🔎 Similar Papers

Diffusion Model Predictive Control