Dynamic Sight Range Selection in Multi-Agent Reinforcement Learning

📅 2025-05-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In multi-agent reinforcement learning (MARL), fixed local observation ranges often lead to insufficient or redundant information, hindering coordination and scalability. To address this, we propose a dynamic field-of-view (FoV) adaptation mechanism that operates without access to global state or inter-agent communication. Our core innovation is an online FoV selection strategy grounded in the Upper Confidence Bound (UCB) principle, which dynamically adjusts each agent’s perception radius based on uncertainty estimates derived from local observations—ensuring both interpretability and efficiency. The method is algorithm-agnostic and seamlessly integrates with standard MARL frameworks (e.g., QMIX, MAPPO) and local observation modeling. Evaluated on benchmark environments—including Level-Based Foraging (LBF), Resource Warehousing (RWARE), and StarCraft Multi-Agent Challenge (SMAC)—our approach consistently improves final policy performance, accelerates training convergence, and automatically identifies stage-optimal FoV configurations throughout training.

Technology Category

Application Category

📝 Abstract
Multi-agent reinforcement Learning (MARL) is often challenged by the sight range dilemma, where agents either receive insufficient or excessive information from their environment. In this paper, we propose a novel method, called Dynamic Sight Range Selection (DSR), to address this issue. DSR utilizes an Upper Confidence Bound (UCB) algorithm and dynamically adjusts the sight range during training. Experiment results show several advantages of using DSR. First, we demonstrate using DSR achieves better performance in three common MARL environments, including Level-Based Foraging (LBF), Multi-Robot Warehouse (RWARE), and StarCraft Multi-Agent Challenge (SMAC). Second, our results show that DSR consistently improves performance across multiple MARL algorithms, including QMIX and MAPPO. Third, DSR offers suitable sight ranges for different training steps, thereby accelerating the training process. Finally, DSR provides additional interpretability by indicating the optimal sight range used during training. Unlike existing methods that rely on global information or communication mechanisms, our approach operates solely based on the individual sight ranges of agents. This approach offers a practical and efficient solution to the sight range dilemma, making it broadly applicable to real-world complex environments.
Problem

Research questions and friction points this paper is trying to address.

Addresses sight range dilemma in MARL with dynamic adjustment
Improves performance across multiple MARL environments and algorithms
Enhances training speed and interpretability without global information
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic Sight Range Selection (DSR) method
UCB algorithm for dynamic adjustment
Individual sight ranges without global info
🔎 Similar Papers
No similar papers found.