Scaling Multi Agent Reinforcement Learning for Underwater Acoustic Tracking via Autonomous Vehicles

📅 2025-05-13

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

Addressing the challenges of difficult training, slow simulation, and poor generalization in multi-agent reinforcement learning (MARL) for underwater acoustic multi-target tracking, this paper proposes an efficient cooperative control framework for autonomous underwater vehicle (AUV) swarms. Methodologically: (1) a GPU-accelerated simplified simulation environment is designed, and high-fidelity Gazebo simulation knowledge is transferred to it via iterative distillation; (2) TransfMAPPO—a Transformer-based multi-agent policy network—is introduced to enable target-agnostic policy modeling, significantly improving sample efficiency; (3) curriculum learning is integrated to accelerate convergence. Experiments demonstrate a 30,000× speedup in simulation training time. In high-fidelity Gazebo validation, the framework achieves long-term stable tracking of multiple high-speed maneuvering targets, with an average localization error below 5 meters.

Technology Category

Application Category

📝 Abstract

Autonomous vehicles (AV) offer a cost-effective solution for scientific missions such as underwater tracking. Recently, reinforcement learning (RL) has emerged as a powerful method for controlling AVs in complex marine environments. However, scaling these techniques to a fleet--essential for multi-target tracking or targets with rapid, unpredictable motion--presents significant computational challenges. Multi-Agent Reinforcement Learning (MARL) is notoriously sample-inefficient, and while high-fidelity simulators like Gazebo's LRAUV provide 100x faster-than-real-time single-robot simulations, they offer no significant speedup for multi-vehicle scenarios, making MARL training impractical. To address these limitations, we propose an iterative distillation method that transfers high-fidelity simulations into a simplified, GPU-accelerated environment while preserving high-level dynamics. This approach achieves up to a 30,000x speedup over Gazebo through parallelization, enabling efficient training via end-to-end GPU acceleration. Additionally, we introduce a novel Transformer-based architecture (TransfMAPPO) that learns multi-agent policies invariant to the number of agents and targets, significantly improving sample efficiency. Following large-scale curriculum learning conducted entirely on GPU, we perform extensive evaluations in Gazebo, demonstrating that our method maintains tracking errors below 5 meters over extended durations, even in the presence of multiple fast-moving targets. This work bridges the gap between large-scale MARL training and high-fidelity deployment, providing a scalable framework for autonomous fleet control in real-world sea missions.

Problem

Research questions and friction points this paper is trying to address.

Scaling MARL for multi-vehicle underwater acoustic tracking efficiently

Overcoming computational challenges in high-fidelity multi-agent simulations

Achieving stable tracking of fast-moving targets with autonomous fleets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Iterative distillation for GPU-accelerated simulation

Transformer-based architecture for multi-agent policies

Large-scale curriculum learning on GPU

🔎 Similar Papers

USV-AUV Collaboration Framework for Underwater Tasks under Extreme Sea Conditions