🤖 AI Summary
In multi-agent reinforcement learning (MARL), fixed local observation ranges often lead to insufficient or redundant information, hindering coordination and scalability. To address this, we propose a dynamic field-of-view (FoV) adaptation mechanism that operates without access to global state or inter-agent communication. Our core innovation is an online FoV selection strategy grounded in the Upper Confidence Bound (UCB) principle, which dynamically adjusts each agent’s perception radius based on uncertainty estimates derived from local observations—ensuring both interpretability and efficiency. The method is algorithm-agnostic and seamlessly integrates with standard MARL frameworks (e.g., QMIX, MAPPO) and local observation modeling. Evaluated on benchmark environments—including Level-Based Foraging (LBF), Resource Warehousing (RWARE), and StarCraft Multi-Agent Challenge (SMAC)—our approach consistently improves final policy performance, accelerates training convergence, and automatically identifies stage-optimal FoV configurations throughout training.
📝 Abstract
Multi-agent reinforcement Learning (MARL) is often challenged by the sight range dilemma, where agents either receive insufficient or excessive information from their environment. In this paper, we propose a novel method, called Dynamic Sight Range Selection (DSR), to address this issue. DSR utilizes an Upper Confidence Bound (UCB) algorithm and dynamically adjusts the sight range during training. Experiment results show several advantages of using DSR. First, we demonstrate using DSR achieves better performance in three common MARL environments, including Level-Based Foraging (LBF), Multi-Robot Warehouse (RWARE), and StarCraft Multi-Agent Challenge (SMAC). Second, our results show that DSR consistently improves performance across multiple MARL algorithms, including QMIX and MAPPO. Third, DSR offers suitable sight ranges for different training steps, thereby accelerating the training process. Finally, DSR provides additional interpretability by indicating the optimal sight range used during training. Unlike existing methods that rely on global information or communication mechanisms, our approach operates solely based on the individual sight ranges of agents. This approach offers a practical and efficient solution to the sight range dilemma, making it broadly applicable to real-world complex environments.