🤖 AI Summary
Existing multi-agent reinforcement learning (MARL) approaches typically bind behaviors statically to agent identities, limiting their ability to adapt dynamically to changing tasks. This work proposes a novel event-triggered paradigm that decouples agent identity from behavior, constructing a continuous behavior manifold and employing an event-driven hypernetwork to dynamically generate LoRA modules for on-the-fly reconstruction of a shared policy. To balance behavioral diversity with reward maximization, the method introduces a Neural Manifold Diversity (NMD) metric. Experimental results demonstrate that the proposed approach outperforms current state-of-the-art methods across multiple benchmarks, exhibits strong zero-shot generalization capabilities, and is the first to successfully solve tasks requiring sequential reallocation of behaviors.
📝 Abstract
Effective multi-agent cooperation requires agents to adopt diverse behaviors as task conditions evolve-and to do so at the right moment. Yet, current Multi-Agent Reinforcement Learning (MARL) frameworks that facilitate this diversity are still limited by the fact that they bind fixed behaviors to fixed agent identities. Consequently, they are ill-equipped for tasks where agents need to take on different roles at very specific moments in time. We argue that, to define these behavioral transitions, the missing ingredient is events. Events are changes in the state of the system that induce qualitative changes in the task. Based on this view, we introduce a framework that decouples agent identity from behavior, capturing a continuous manifold from which agents instantiate their behaviors in response to events. This framework is based on two elements. First, to build an expressive behavior manifold, we introduce Neural Manifold Diversity (NMD), a formal distance metric that remains well-defined when behaviors are transient and agent-agnostic. Second, we use an event-based hypernetwork that generates Low-Rank Adaptation (LoRA) modules over a shared team policy, enabling on-the-fly agent-policy reconfiguration in response to events. We prove that this construction ensures that diversity does not interfere with reward maximization by design. Empirical results demonstrate that our framework outperforms established baselines across benchmarks while exhibiting zero-shot generalization, and being the only method that solves tasks requiring sequential behavior reassignment.