🤖 AI Summary
This work addresses the challenge of training instability in offline multi-agent reinforcement learning (MARL) caused by nonlinear value decomposition, which often leads to value magnitude amplification and hinders the use of complex hybrid networks. To mitigate this issue, the authors propose Scale-Invariant Value Normalization (SVN), a method that stabilizes training without altering the Bellman fixed point. By systematically integrating SVN with key components such as the Actor-Critic framework and value regularization, they develop the first stable offline MARl algorithm capable of supporting nonlinear value decomposition. Experimental results demonstrate that the proposed approach significantly enhances both training stability and performance, thereby unlocking the potential of sophisticated hybrid network architectures in offline multi-agent settings.
📝 Abstract
Despite remarkable achievements in single-agent offline reinforcement learning (RL), multi-agent RL (MARL) has struggled to adopt this paradigm, largely persisting with on-policy training and self-play from scratch. One reason for this gap comes from the instability of non-linear value decomposition, leading prior works to avoid complex mixing networks in favor of linear value decomposition (e.g., VDN) with value regularization used in single-agent setups. In this work, we analyze the source of instability in non-linear value decomposition within the offline MARL setting. Our observations confirm that they induce value-scale amplification and unstable optimization. To alleviate this, we propose a simple technique, scale-invariant value normalization (SVN), that stabilizes actor-critic training without altering the Bellman fixed point. Empirically, we examine the interaction among key components of offline MARL (e.g., value decomposition, value learning, and policy extraction) and derive a practical recipe that unlocks its full potential.