🤖 AI Summary
To address the challenges of dynamic environmental belief evolution, low collaborative efficiency, and budget constraints in Multi-Agent Information Path Planning (MAIPP), this paper proposes the first decentralized path planning framework based on diffusion models. Methodologically: (1) a non-autoregressive diffusion model generates long-horizon intent trajectories, eliminating error accumulation inherent in autoregressive modeling; (2) a novel reinforcement learning algorithm—Diffusion-based Proximal Policy Optimization (DPPO)—integrates behavioral cloning with diffusion-based policy learning to enable long-horizon strategy modeling and online decentralized decision-making. The key contribution is the first application of diffusion models to MAIPP, significantly enhancing scalability and robustness. Experiments demonstrate up to a 17% improvement in information gain, a 4× speedup in execution time, and effective support for large-scale agent deployment compared to state-of-the-art baselines.
📝 Abstract
Information gathering in large-scale or time-critical scenarios (e.g., environmental monitoring, search and rescue) requires broad coverage within limited time budgets, motivating the use of multi-agent systems. These scenarios are commonly formulated as multi-agent informative path planning (MAIPP), where multiple agents must coordinate to maximize information gain while operating under budget constraints. A central challenge in MAIPP is ensuring effective coordination while the belief over the environment evolves with incoming measurements. Recent learning-based approaches address this by using distributions over future positions as"intent"to support coordination. However, these autoregressive intent predictors are computationally expensive and prone to compounding errors. Inspired by the effectiveness of diffusion models as expressive, long-horizon policies, we propose AID, a fully decentralized MAIPP framework that leverages diffusion models to generate long-term trajectories in a non-autoregressive manner. AID first performs behavior cloning on trajectories produced by existing MAIPP planners and then fine-tunes the policy using reinforcement learning via Diffusion Policy Policy Optimization (DPPO). This two-stage pipeline enables the policy to inherit expert behavior while learning improved coordination through online reward feedback. Experiments demonstrate that AID consistently improves upon the MAIPP planners it is trained from, achieving up to 4x faster execution and 17% increased information gain, while scaling effectively to larger numbers of agents. Our implementation is publicly available at https://github.com/marmotlab/AID.