🤖 AI Summary
This work addresses decentralized combinatorial optimization in dynamic multi-agent systems. We propose a hierarchical framework integrating reinforcement learning and collective learning: a high-level multi-agent reinforcement learning (MARL) module provides strategic guidance, while a low-level distributed collective learning mechanism enables cooperative decision-making. The architecture preserves agent autonomy while achieving scalability through action-space compression and minimal communication overhead, balancing long-term strategic planning, short-term collective performance, and environmental adaptability. Our key contribution is the first integration of MARL with decentralized collective learning to establish a scalable, Pareto-optimal evolutionary mechanism. Experiments in synthetic benchmarks and real-world smart-city applications—including energy self-management and drone swarm sensing—demonstrate significant improvements over standalone MARL or collective learning baselines, yielding superior optimization performance, system scalability, and robustness.
📝 Abstract
Decentralized combinatorial optimization in evolving multi-agent systems poses significant challenges, requiring agents to balance long-term decision-making, short-term optimized collective outcomes, while preserving autonomy of interactive agents under unanticipated changes. Reinforcement learning offers a way to model sequential decision-making through dynamic programming to anticipate future environmental changes. However, applying multi-agent reinforcement learning (MARL) to decentralized combinatorial optimization problems remains an open challenge due to the exponential growth of the joint state-action space, high communication overhead, and privacy concerns in centralized training. To address these limitations, this paper proposes Hierarchical Reinforcement and Collective Learning (HRCL), a novel approach that leverages both MARL and decentralized collective learning based on a hierarchical framework. Agents take high-level strategies using MARL to group possible plans for action space reduction and constrain the agent behavior for Pareto optimality. Meanwhile, the low-level collective learning layer ensures efficient and decentralized coordinated decisions among agents with minimal communication. Extensive experiments in a synthetic scenario and real-world smart city application models, including energy self-management and drone swarm sensing, demonstrate that HRCL significantly improves performance, scalability, and adaptability compared to the standalone MARL and collective learning approaches, achieving a win-win synthesis solution.