🤖 AI Summary
This work investigates the fundamental differences in robustness and performance between single-agent reinforcement learning (SARL) and multi-agent reinforcement learning (MARL) under partial observability. Theoretically, we prove that SARL and MARL are equivalent under full observability, whereas local observability is a necessary condition for MARL to achieve enhanced robustness. Building on this insight, we derive an upper bound on performance degradation of local policies under system disturbances. Methodologically, we integrate distributed MARL with Lyapunov stability analysis, and validate our approach on both a mobile manipulation robot platform and the Multi-Agent Particle Environment (MPE) benchmark. Results show that MARL matches centralized methods in simulation performance while demonstrating significantly improved robustness to agent failures and environmental perturbations in real-world experiments. Our core contribution is the first formal characterization of the capability boundaries between SARL and MARL from an observability perspective, along with a theoretical guarantee on the robustness of local policies.
📝 Abstract
While many robotic tasks can be addressed through either centralized single-agent control with full state observation or decentralized multi-agent control, clear criteria for selecting the optimal approach are lacking. This paper presents a comprehensive investigation into how multi-agent reinforcement learning (MARL) with local observations can enhance robustness in complex robotic systems compared to traditional centralized control methods. We provide both theoretical analysis and empirical validation demonstrating that in certain tasks, decentralized MARL controllers can achieve performance comparable to centralized approaches while offering superior robustness against perturbations and agent failures. Our theoretical contributions include an analytical proof of equivalence between SARL and MARL under full observability conditions, identifying observability as the key distinguishing factor, and derivation of performance degradation bounds for locally observable policies under external perturbations. Empirical validation on standard MARL benchmarks confirms that locally observable MARL maintains competitive performance despite limited observations. Real-world experiments with a mobile manipulation robot demonstrate that our decentralized MARL controllers exhibit significantly improved robustness to both agent malfunctions and environmental disturbances compared to centralized baselines. This systematic investigation provides crucial insights for designing robust and generalizable control strategies in complex robotic systems, establishing MARL with local observations as a viable alternative to traditional centralized control paradigms.