🤖 AI Summary
Multi-agent coordination faces a fundamental trade-off between expressiveness and real-time inference: denoising diffusion models offer strong representational capacity but suffer from slow sampling, while Gaussian policies enable efficient inference yet lack expressive joint-action modeling. This paper proposes MAC-Flow, the first framework to introduce flow matching into multi-agent joint behavioral representation learning. By explicitly modeling high-dimensional joint action distributions, MAC-Flow significantly enhances cooperative expressiveness. Furthermore, we design a knowledge distillation mechanism that compresses the flow model into a lightweight, decentralized one-step policy network—preserving modeling fidelity while achieving real-time inference. Extensive experiments across 12 environments and 34 datasets demonstrate that MAC-Flow achieves 14.5× faster inference than diffusion-based methods—matching the speed of Gaussian policies—while substantially outperforming them on complex cooperative tasks. MAC-Flow thus breaks the conventional efficiency–effectiveness trade-off in multi-agent coordination.
📝 Abstract
This work presents MAC-Flow, a simple yet expressive framework for multi-agent coordination. We argue that requirements of effective coordination are twofold: (i) a rich representation of the diverse joint behaviors present in offline data and (ii) the ability to act efficiently in real time. However, prior approaches often sacrifice one for the other, i.e., denoising diffusion-based solutions capture complex coordination but are computationally slow, while Gaussian policy-based solutions are fast but brittle in handling multi-agent interaction. MAC-Flow addresses this trade-off by first learning a flow-based representation of joint behaviors, and then distilling it into decentralized one-step policies that preserve coordination while enabling fast execution. Across four different benchmarks, including $12$ environments and $34$ datasets, MAC-Flow alleviates the trade-off between performance and computational cost, specifically achieving about $oldsymbol{ imes14.5}$ faster inference compared to diffusion-based MARL methods, while maintaining good performance. At the same time, its inference speed is similar to that of prior Gaussian policy-based offline multi-agent reinforcement learning (MARL) methods.