🤖 AI Summary
This work addresses the curse of dimensionality in the joint trajectory space inherent to large-scale offline multi-agent reinforcement learning by proposing MF-Diffuser, a mean-field diffusion framework that introduces diffusion models to this domain for the first time. By modeling trajectory distributions in Wasserstein space and leveraging propagation of chaos, the method approximates global dynamics using only a small set of representative agents. Efficient planning is achieved through a value-weighted chaotic entropy objective and a hierarchical coarse-to-fine denoising strategy. Theoretical analysis establishes an end-to-end suboptimality bound, proving that the learned policy converges to a mean-field Nash equilibrium with distributional shift error independent of the number of agents. Experiments demonstrate that MF-Diffuser significantly outperforms existing approaches across three mean-field RL benchmarks, with particularly pronounced gains in settings involving suboptimal data and ultra-large populations (N ≥ 10³).
📝 Abstract
Diffusion-based planning has achieved strong results in single-agent offline reinforcement learning, yet scaling to many-agent systems remains intractable due to the curse of dimensionality in the joint trajectory space. We introduce MF-Diffuser, a framework that lifts trajectory planning to the Wasserstein space of trajectory distributions, where the propagation of chaos ensures a small representative subset of agents captures the full population dynamics. Our approach features a value-weighted chaotic entropy objective that reconciles generative fidelity with return maximization, and a hierarchical coarse-to-fine strategy that progressively grows the agent population during denoising. We establish end-to-end suboptimality bounds with four interpretable terms, revealing that mean-field approximation error scales as $O(H^2/\sqrt{N})$ while offline distribution shift provably does not grow with population size $N$, and prove the generated policy is an approximate mean-field Nash equilibrium with explicit convergence guarantees. Experiments on three mean-field RL benchmarks -- spanning stage games, sequential dynamics, and adversarial team competition -- show MF-Diffuser achieves the best return in the majority of settings, with the largest gains on suboptimal offline data and at extreme scales ($N \geq 10^3$).