MADiff: Offline Multi-agent Learning with Diffusion Models

πŸ“… 2023-05-27
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 22
✨ Influential: 1
πŸ“„ PDF
πŸ€– AI Summary
To address poor policy generalization and challenges in modeling cooperative dynamics in offline multi-agent reinforcement learning (MARL), this paper introduces Diff-MARLβ€”the first diffusion-based MARL framework. Diff-MARL innovatively integrates attention mechanisms into the diffusion process, enabling unified centralized modeling of joint trajectories and decentralized policy execution. By leveraging attention-augmented diffusion models, it explicitly captures dynamic inter-agent interactions, thereby overcoming the applicability limitations of single-agent diffusion methods in multi-agent settings. Evaluated on multiple standard offline MARL benchmarks, Diff-MARL significantly outperforms Q-learning and supervised learning baselines, achieving state-of-the-art performance in both cooperative behavior modeling accuracy and sample efficiency.
πŸ“ Abstract
Offline reinforcement learning (RL) aims to learn policies from pre-existing datasets without further interactions, making it a challenging task. Q-learning algorithms struggle with extrapolation errors in offline settings, while supervised learning methods are constrained by model expressiveness. Recently, diffusion models (DMs) have shown promise in overcoming these limitations in single-agent learning, but their application in multi-agent scenarios remains unclear. Generating trajectories for each agent with independent DMs may impede coordination, while concatenating all agents' information can lead to low sample efficiency. Accordingly, we propose MADiff, which is realized with an attention-based diffusion model to model the complex coordination among behaviors of multiple agents. To our knowledge, MADiff is the first diffusion-based multi-agent learning framework, functioning as both a decentralized policy and a centralized controller. During decentralized executions, MADiff simultaneously performs teammate modeling, and the centralized controller can also be applied in multi-agent trajectory predictions. Our experiments demonstrate that MADiff outperforms baseline algorithms across various multi-agent learning tasks, highlighting its effectiveness in modeling complex multi-agent interactions. Our code is available at https://github.com/zbzhu99/madiff.
Problem

Research questions and friction points this paper is trying to address.

Multi-Agent Systems
Offline Reinforcement Learning
Cooperative Learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Agent Systems
Attention-based Diffusion Model
Offline Reinforcement Learning
πŸ”Ž Similar Papers
No similar papers found.