Offline Multi-agent Reinforcement Learning via Score Decomposition

📅 2025-05-09

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Offline multi-agent reinforcement learning (MARL) suffers from distributional shift due to the high-dimensional joint action space and multimodal coordination policies, rendering existing methods vulnerable to out-of-distribution (OOD) joint actions and limiting performance. To address this, we propose a two-stage diffusion-driven framework: first, a diffusion generative model explicitly captures multimodal cooperative policies; second, a decomposable sequential score function enables per-agent policy regularization and decentralized execution. This is the first work to introduce diffusion models into offline MARL for modeling complex coordination behaviors, unifying policy representation, distribution matching, and distributed deployment. Evaluated on continuous-control offline MARL benchmarks, our method achieves an average normalized return improvement of 26.3% over state-of-the-art approaches, demonstrating superior capability in modeling coordinated equilibrium selection.

Technology Category

Application Category

📝 Abstract

Offline multi-agent reinforcement learning (MARL) faces critical challenges due to distributional shifts, further exacerbated by the high dimensionality of joint action spaces and the diversity in coordination strategies and quality among agents. Conventional approaches, including independent learning frameworks and value decomposition methods based on pessimistic principles, remain susceptible to out-of-distribution (OOD) joint actions and often yield suboptimal performance. Through systematic analysis of prevalent offline MARL benchmarks, we identify that this limitation primarily stems from the inherently multimodal nature of joint collaborative policies induced by offline data collection. To address these challenges, we propose a novel two-stage framework: First, we employ a diffusion-based generative model to explicitly capture the complex behavior policy, enabling accurate modeling of diverse multi-agent coordination patterns. Second, we introduce a sequential score function decomposition mechanism to regularize individual policies and enable decentralized execution. Extensive experiments on continuous control tasks demonstrate state-of-the-art performance across multiple standard offline MARL benchmarks, outperforming existing methods by 26.3% in normalized returns. Our approach provides new insights into offline coordination and equilibrium selection in cooperative multi-agent systems.

Problem

Research questions and friction points this paper is trying to address.

Addresses distributional shifts in offline multi-agent reinforcement learning

Overcomes challenges of high-dimensional joint action spaces and diverse coordination

Proposes a novel two-stage framework for improved offline MARL performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses diffusion-based generative model for behavior policy

Introduces sequential score function decomposition mechanism

Enables decentralized execution with regularized individual policies

🔎 Similar Papers

No similar papers found.