Towards Bio-inspired Heuristically Accelerated Reinforcement Learning for Adaptive Underwater Multi-Agents Behaviour

📅 2025-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of high dynamic uncertainty, limited communication, and slow training convergence in underwater multi-AUV cooperative area coverage tasks, this paper proposes a PSO-guided multi-agent reinforcement learning (MARL) framework. The method integrates particle swarm optimization (PSO)—a bio-inspired metaheuristic—into the Multi-Agent Soft Actor-Critic (MSAC) algorithm, enabling heuristic-guided exploration of high-value state-action regions during early training stages and thereby improving the exploration-exploitation trade-off. Leveraging deep neural networks for function approximation and continuous-control MARL techniques, the approach significantly reduces training interaction steps and accelerates convergence to optimal collaborative policies in 2D underwater coverage simulations. Experimental results demonstrate that the proposed method achieves comparable task performance while substantially enhancing training efficiency, offering a practical pathway for deploying MARL in real-world underwater robotic systems.

Technology Category

Application Category

📝 Abstract
This paper describes the problem of coordination of an autonomous Multi-Agent System which aims to solve the coverage planning problem in a complex environment. The considered applications are the detection and identification of objects of interest while covering an area. These tasks, which are highly relevant for space applications, are also of interest among various domains including the underwater context, which is the focus of this study. In this context, coverage planning is traditionally modelled as a Markov Decision Process where a coordinated MAS, a swarm of heterogeneous autonomous underwater vehicles, is required to survey an area and search for objects. This MDP is associated with several challenges: environment uncertainties, communication constraints, and an ensemble of hazards, including time-varying and unpredictable changes in the underwater environment. MARL algorithms can solve highly non-linear problems using deep neural networks and display great scalability against an increased number of agents. Nevertheless, most of the current results in the underwater domain are limited to simulation due to the high learning time of MARL algorithms. For this reason, a novel strategy is introduced to accelerate this convergence rate by incorporating biologically inspired heuristics to guide the policy during training. The PSO method, which is inspired by the behaviour of a group of animals, is selected as a heuristic. It allows the policy to explore the highest quality regions of the action and state spaces, from the beginning of the training, optimizing the exploration/exploitation trade-off. The resulting agent requires fewer interactions to reach optimal performance. The method is applied to the MSAC algorithm and evaluated for a 2D covering area mission in a continuous control environment.
Problem

Research questions and friction points this paper is trying to address.

Accelerating MARL for underwater multi-agents
Enhancing coverage planning in complex environments
Reducing learning time with bio-inspired heuristics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bio-inspired heuristics accelerate learning convergence
PSO method optimizes exploration-exploitation trade-off
MSAC algorithm enhances multi-agent coordination efficiency
🔎 Similar Papers
No similar papers found.