Fixing Incomplete Value Function Decomposition for Multi-Agent Reinforcement Learning

📅 2025-05-15
📈 Citations: 0
Influential: 0
📄 PDF

career value

216K/year
🤖 AI Summary
In multi-agent reinforcement learning, value function decomposition must satisfy the Individual-Global-Max (IGM) property to ensure policy consistency; however, mainstream methods like VDN and QMIX suffer from limited expressivity and cannot represent the complete IGM value class, whereas the expressive QPLEX incurs excessive complexity. This paper introduces the QFIX family of models, which provides, for the first time, a concise, differentiable parameterization that fully captures the IGM value class. QFIX integrates a lightweight, learnable monotonic mixing correction layer into the VDN/QMIX architecture. This design achieves theoretical completeness while substantially reducing parameter count and computational overhead, thereby enhancing training stability and convergence speed. Empirical evaluation on SMACv2 and Overcooked benchmarks demonstrates that QFIX consistently outperforms VDN and QMIX, matches or exceeds QPLEX in performance, and establishes new state-of-the-art results.

Technology Category

Application Category

📝 Abstract
Value function decomposition methods for cooperative multi-agent reinforcement learning compose joint values from individual per-agent utilities, and train them using a joint objective. To ensure that the action selection process between individual utilities and joint values remains consistent, it is imperative for the composition to satisfy the individual-global max (IGM) property. Although satisfying IGM itself is straightforward, most existing methods (e.g., VDN, QMIX) have limited representation capabilities and are unable to represent the full class of IGM values, and the one exception that has no such limitation (QPLEX) is unnecessarily complex. In this work, we present a simple formulation of the full class of IGM values that naturally leads to the derivation of QFIX, a novel family of value function decomposition models that expand the representation capabilities of prior models by means of a thin"fixing"layer. We derive multiple variants of QFIX, and implement three variants in two well-known multi-agent frameworks. We perform an empirical evaluation on multiple SMACv2 and Overcooked environments, which confirms that QFIX (i) succeeds in enhancing the performance of prior methods, (ii) learns more stably and performs better than its main competitor QPLEX, and (iii) achieves this while employing the simplest and smallest mixing models.
Problem

Research questions and friction points this paper is trying to address.

Enhancing representation of IGM values in multi-agent reinforcement learning
Simplifying complex models for value function decomposition
Improving performance and stability in cooperative multi-agent systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

QFIX introduces a thin fixing layer
QFIX expands representation capabilities
QFIX simplifies IGM value decomposition