Diffusion-Reinforcement Learning Hierarchical Motion Planning in Adversarial Multi-agent Games

📅 2024-03-16

🏛️ arXiv.org

📈 Citations: 5

✨ Influential: 0

career value

203K/year

🤖 AI Summary

To address motion planning challenges for evaders in partially observable multi-agent pursuit-evasion games, this paper proposes a hierarchical planning framework that synergistically integrates diffusion models with reinforcement learning. At the high level, a Denoising Diffusion Probabilistic Model (DDPM) generates interpretable and responsive global paths conditioned on environmental observations; at the low level, PPO or SAC policies dynamically fuse path-following with real-time obstacle avoidance. This paradigm achieves the first tight coupling of diffusion-based prior guidance with RL-based online decision-making, enabling policy decomposition and online behavioral fusion under a partially observable Markov decision process (PO-MDP) formulation. Evaluated across multiple benchmark scenarios, the approach improves collision avoidance rate by 77.18%, target arrival rate by 47.38%, and achieves an average 51.4% gain in overall performance—significantly overcoming key limitations of conventional RL methods in terms of path adaptability and robustness within dynamic adversarial environments.

Technology Category

Application Category

📝 Abstract

Reinforcement Learning (RL)-based motion planning has recently shown the potential to outperform traditional approaches from autonomous navigation to robot manipulation. In this work, we focus on a motion planning task for an evasive target in a partially observable multi-agent adversarial pursuit-evasion game (PEG). Pursuit-evasion problems are relevant to various applications, such as search and rescue operations and surveillance robots, where robots must effectively plan their actions to gather intelligence or accomplish mission tasks while avoiding detection or capture. We propose a hierarchical architecture that integrates a high-level diffusion model to plan global paths responsive to environment data, while a low-level RL policy reasons about evasive versus global path-following behavior. The benchmark results across different domains and different observability show that our approach outperforms baselines by 77.18% and 47.38% on detection and goal reaching rate, which leads to 51.4% increasing of the performance score on average. Additionally, our method improves interpretability, flexibility and efficiency of the learned policy.

Problem

Research questions and friction points this paper is trying to address.

Motion planning for evasive targets in adversarial games

Hierarchical integration of diffusion models and RL policies

Improving detection avoidance and goal achievement rates

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical architecture integrates diffusion and RL

Diffusion model plans global paths from data

RL policy handles evasive and path-following behavior

🔎 Similar Papers

No similar papers found.