🤖 AI Summary
To address the challenges of prolonged training duration, opaque decision-making processes, and high deployment risks in multi-robot systems trained via multi-agent reinforcement learning (MARL), this paper proposes an LLM-Augmented MARL framework. Our method introduces, for the first time, an LLM-based linguistic negotiation mechanism among robots, dynamically coupling natural-language task coordination with policy learning to enable adaptive mode switching during training. We further design a language-driven action planning module and a hybrid decision architecture, wherein the LLM generates executable task plans and actively guides real-time RL policy updates. Experimental results demonstrate that, while maintaining equivalent task performance, the proposed framework reduces training episodes by an average of 42%. It also accelerates safe deployment from simulation to physical robots, outperforming pure RL baselines in real-world evaluations.
📝 Abstract
Multi-agent reinforcement learning is a key method for training multi-robot systems over a series of episodes in which robots are rewarded or punished according to their performance; only once the system is trained to a suitable standard is it deployed in the real world. If the system is not trained enough, the task will likely not be completed and could pose a risk to the surrounding environment. We introduce Multi-Agent Reinforcement Learning guided by Language-based Inter-Robot Negotiation (MARLIN), in which the training process requires fewer training episodes to reach peak performance. Robots are equipped with large language models that negotiate and debate a task, producing plans used to guide the policy during training. The approach dynamically switches between using reinforcement learning and large language model-based action negotiation throughout training. This reduces the number of training episodes required, compared to standard multi-agent reinforcement learning, and hence allows the system to be deployed to physical hardware earlier. The performance of this approach is evaluated against multi-agent reinforcement learning, showing that our hybrid method achieves comparable results with significantly reduced training time.