🤖 AI Summary
To address motion planning challenges for evaders in partially observable multi-agent pursuit-evasion games, this paper proposes a hierarchical planning framework that synergistically integrates diffusion models with reinforcement learning. At the high level, a Denoising Diffusion Probabilistic Model (DDPM) generates interpretable and responsive global paths conditioned on environmental observations; at the low level, PPO or SAC policies dynamically fuse path-following with real-time obstacle avoidance. This paradigm achieves the first tight coupling of diffusion-based prior guidance with RL-based online decision-making, enabling policy decomposition and online behavioral fusion under a partially observable Markov decision process (PO-MDP) formulation. Evaluated across multiple benchmark scenarios, the approach improves collision avoidance rate by 77.18%, target arrival rate by 47.38%, and achieves an average 51.4% gain in overall performance—significantly overcoming key limitations of conventional RL methods in terms of path adaptability and robustness within dynamic adversarial environments.
📝 Abstract
Reinforcement Learning (RL)-based motion planning has recently shown the potential to outperform traditional approaches from autonomous navigation to robot manipulation. In this work, we focus on a motion planning task for an evasive target in a partially observable multi-agent adversarial pursuit-evasion game (PEG). Pursuit-evasion problems are relevant to various applications, such as search and rescue operations and surveillance robots, where robots must effectively plan their actions to gather intelligence or accomplish mission tasks while avoiding detection or capture. We propose a hierarchical architecture that integrates a high-level diffusion model to plan global paths responsive to environment data, while a low-level RL policy reasons about evasive versus global path-following behavior. The benchmark results across different domains and different observability show that our approach outperforms baselines by 77.18% and 47.38% on detection and goal reaching rate, which leads to 51.4% increasing of the performance score on average. Additionally, our method improves interpretability, flexibility and efficiency of the learned policy.