🤖 AI Summary
Most existing multi-agent reinforcement learning (MARL) research emphasizes intra-group behavioral consistency while neglecting the joint optimization of intra-group cooperation and inter-group specialization in multi-group settings. To address this, we propose a novel two-level behavioral consistency framework that, for the first time, jointly models and constrains intra-group policy alignment and inter-group policy differentiation within a unified architecture. Our approach incorporates a dynamic grouping mechanism, explicit functional constraints on policies, and behavioral diversity regularization—enabling algorithm-agnostic design without requiring centralized training or explicit communication. Empirical evaluation across diverse collaborative benchmarks demonstrates that our method significantly improves both intra-group cooperation efficiency and inter-group task specialization. Specifically, it achieves an average 12.7% increase in task completion rate and a 3.2× improvement in inter-group behavioral divergence over state-of-the-art MARL algorithms, validating its effectiveness and generalizability.
📝 Abstract
Behavioral diversity in Multi-agent reinforcement learning(MARL) represents an emerging and promising research area. Prior work has largely centered on intra-group behavioral consistency in multi-agent systems, with limited attention given to behavioral consistency in multi-agent grouping scenarios. In this paper, we introduce Dual-Level Behavioral Consistency (DLBC), a novel MARL control method designed to explicitly regulate agent behaviors at both intra-group and inter-group levels. DLBC partitions agents into distinct groups and dynamically modulates behavioral diversity both within and between these groups. By dynamically modulating behavioral diversity within and between these groups, DLBC achieves enhanced division of labor through inter-group consistency, which constrains behavioral strategies across different groups. Simultaneously, intra-group consistency, achieved by aligning behavioral strategies within each group, fosters stronger intra-group cooperation. Crucially, DLBC's direct constraint of agent policy functions ensures its broad applicability across various algorithmic frameworks. Experimental results in various grouping cooperation scenarios demonstrate that DLBC significantly enhances both intra-group cooperative performance and inter-group task specialization, yielding substantial performance improvements. DLBC provides new ideas for behavioral consistency control of multi-intelligent body systems, and its potential for application in more complex tasks and dynamic environments can be further explored in the future.