🤖 AI Summary
To address the scalability bottleneck in multi-agent reinforcement learning (MARL) arising from the exponential growth of the joint action space with the number of agents, this paper proposes a centralized learning framework based on sequential abstraction. The core innovation is the introduction of a “supervisor” meta-agent that decouples the high-dimensional joint action into temporally ordered individual action sequences, thereby enabling structured dimensionality reduction of the joint action space. By decomposing the action space sequentially and parameterizing the joint policy with lightweight components, the method preserves global coordination while significantly reducing computational complexity. Experiments across multi-agent tasks of varying scales demonstrate that our approach outperforms existing centralized methods in training efficiency, convergence speed, and cooperative performance. Moreover, it exhibits strong generalization capability and practical deployability.
📝 Abstract
In this article, we propose a centralized Multi-Agent Learning framework for learning a policy that models the simultaneous behavior of multiple agents that need to coordinate to solve a certain task. Centralized approaches often suffer from the explosion of an action space that is defined by all possible combinations of individual actions, known as joint actions. Our approach addresses the coordination problem via a sequential abstraction, which overcomes the scalability problems typical to centralized methods. It introduces a meta-agent, called supervisor, which abstracts joint actions as sequential assignments of actions to each agent. This sequential abstraction not only simplifies the centralized joint action space but also enhances the framework's scalability and efficiency. Our experimental results demonstrate that the proposed approach successfully coordinates agents across a variety of Multi-Agent Learning environments of diverse sizes.