CRAFT: Coaching Reinforcement Learning Autonomously using Foundation Models for Multi-Robot Coordination Tasks

📅 2025-09-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address challenges in multi-agent reinforcement learning (MARL) for collaborative robotics—including high-dimensional continuous action spaces, non-stationarity, and complex reward design—this paper proposes a foundation-model-based closed-loop curriculum learning framework. Our method leverages large language models (LLMs) to autonomously plan long-horizon task sequences and decompose them into trainable subtasks, while vision-language models (VLMs) dynamically refine sparse reward functions. Crucially, we introduce foundation models as “intelligent coaches,” enabling joint, autonomous task decomposition and reward shaping for the first time. Integrated with hierarchical reinforcement learning, the framework is validated on multi-quadruped navigation and bimanual cooperative manipulation tasks, and successfully transferred to real-world robotic hardware. Results demonstrate significantly reduced human intervention and end-to-end efficient learning of complex collaborative behaviors.

Technology Category

Application Category

📝 Abstract
Multi-Agent Reinforcement Learning (MARL) provides a powerful framework for learning coordination in multi-agent systems. However, applying MARL to robotics still remains challenging due to high-dimensional continuous joint action spaces, complex reward design, and non-stationary transitions inherent to decentralized settings. On the other hand, humans learn complex coordination through staged curricula, where long-horizon behaviors are progressively built upon simpler skills. Motivated by this, we propose CRAFT: Coaching Reinforcement learning Autonomously using Foundation models for multi-robot coordination Tasks, a framework that leverages the reasoning capabilities of foundation models to act as a "coach" for multi-robot coordination. CRAFT automatically decomposes long-horizon coordination tasks into sequences of subtasks using the planning capability of Large Language Models (LLMs). In what follows, CRAFT trains each subtask using reward functions generated by LLM, and refines them through a Vision Language Model (VLM)-guided reward-refinement loop. We evaluate CRAFT on multi-quadruped navigation and bimanual manipulation tasks, demonstrating its capability to learn complex coordination behaviors. In addition, we validate the multi-quadruped navigation policy in real hardware experiments.
Problem

Research questions and friction points this paper is trying to address.

Automating multi-robot coordination task decomposition
Addressing complex reward design in MARL robotics
Overcoming non-stationary transitions in decentralized systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses LLMs to decompose tasks into subtasks
Employs LLM-generated rewards for subtask training
Refines rewards via VLM-guided feedback loop
🔎 Similar Papers
S
Seoyeon Choi
Mechanical Engineering, University of California Berkeley
Kanghyun Ryu
Kanghyun Ryu
Mechanical Engineering, University of California Berkeley
J
Jonghoon Ock
Mechanical Engineering, University of California Berkeley
Negar Mehr
Negar Mehr
Assistant Professor, University of California, Berkeley
Control TheoryGame TheoryRobotics