🤖 AI Summary
To address the instability and poor convergence of policy training in multi-agent reinforcement learning (MARL) caused by environmental stochasticity and uncertainty, this paper proposes a novel method integrating distributional reinforcement learning with barrier-function-based safety constraints. Our approach innovatively embeds fault-driven safety metrics into a distributed MARL framework and introduces a barrier-function-derived safety loss term, enabling risk-aware early-stage safe exploration and facilitating stable, cooperative policy convergence. Evaluated on the StarCraft II micromanagement benchmark, our method achieves significantly faster convergence, reduces safety violation rates by 37%, and improves task completion rate by 12.5% over the current state-of-the-art. To the best of our knowledge, this is the first work to jointly optimize safety guarantees and collaborative performance in distributed MARL.
📝 Abstract
Multi-Agent Reinforcement Learning (MARL) has gained significant traction for solving complex real-world tasks, but the inherent stochasticity and uncertainty in these environments pose substantial challenges to efficient and robust policy learning. While Distributional Reinforcement Learning has been successfully applied in single-agent settings to address risk and uncertainty, its application in MARL is substantially limited. In this work, we propose a novel approach that integrates distributional learning with a safety-focused loss function to improve convergence in cooperative MARL tasks. Specifically, we introduce a Barrier Function based loss that leverages safety metrics, identified from inherent faults in the system, into the policy learning process. This additional loss term helps mitigate risks and encourages safer exploration during the early stages of training. We evaluate our method in the StarCraft II micromanagement benchmark, where our approach demonstrates improved convergence and outperforms state-of-the-art baselines in terms of both safety and task completion. Our results suggest that incorporating safety considerations can significantly enhance learning performance in complex, multi-agent environments.