Fault Tolerant Multi-Agent Learning with Adversarial Budget Constraints

📅 2025-08-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the insufficient robustness of multi-agent reinforcement learning (MARL) under sudden failures of subsets of agents, this paper proposes MARTA, a plug-and-play training framework. Methodologically, MARTA formulates a budget-constrained adversarial Markov game, where targeted agent failures are modeled via Markov switching control, and jointly optimizes cooperative policies and adversarial failure policies. We theoretically establish convergence to a Markov-perfect equilibrium. Our key contribution is the first integration of budget-limited adversarial training with switching control into MARL-based fault tolerance, enabling dynamic game-theoretic modeling of failure simulation and coordinated response. Empirical evaluation on Multi-Agent Particle World and Level-Based Foraging benchmarks demonstrates that MARTA significantly improves task success rates and system stability, achieving state-of-the-art fault-tolerant performance.

Technology Category

Application Category

📝 Abstract
In multi-agent systems, the safe and reliable execution of tasks often depends on agents correctly coordinating their actions. However, in real-world deployments, failures of computational components are inevitable, presenting a critical challenge: ensuring that multi-agent reinforcement learning (MARL) policies remain effective even when some agents malfunction. We propose the Multi-Agent Robust Training Algorithm (MARTA), a plug-and-play framework for training MARL agents to be resilient to potentially severe faults. MARTA operates in cooperative multi-agent settings where agents may lose the ability to execute their intended actions. It learns to identify failure scenarios that are especially detrimental to system performance and equips agents with strategies to mitigate their impact. At the heart of MARTA is a novel adversarial Markov game in which an adversary -- modelled via emph{Markov switching controls} -- learns to disable agents in high-risk state regions, while the remaining agents are trained to emph{jointly} best-respond to such targeted malfunctions. To ensure practicality, MARTA enforces a malfunction budget, constraining the adversary to a fixed number of failures and learning robust policies accordingly. We provide theoretical guarantees that MARTA converges to a Markov perfect equilibrium, ensuring agents optimally counteract worst-case faults. Empirically, we show that MARTA achieves state-of-the-art fault-tolerant performance across benchmark environments, including Multi-Agent Particle World and Level-Based Foraging.
Problem

Research questions and friction points this paper is trying to address.

Ensuring MARL policies work with agent malfunctions
Training agents to mitigate severe failure impacts
Adversarial learning with constrained malfunction budgets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversarial Markov game for fault resilience
Markov switching controls model adversary
Malfunction budget ensures practical robustness
🔎 Similar Papers
No similar papers found.