CoLF: Learning Consistent Leader-Follower Policies for Vision-Language-Guided Multi-Robot Cooperative Transport

📅 2026-02-08

📈 Citations: 0

✨ Influential: 0

career value

236K/year

🤖 AI Summary

This work addresses the challenge of perceptual inconsistency in vision-language-guided cooperative transport among multiple robots, which arises from divergent viewpoints and linguistic ambiguity. To this end, the authors propose the CoLF framework, which employs a dependency-based leader-follower architecture integrating an asymmetric policy network and a mutual information maximization mechanism. This design enables followers to effectively predict the leader’s actions from their local observations, thereby achieving stable role assignment and consistent collaboration. The approach is formulated within a multi-agent reinforcement learning setting and optimized under the centralized training with decentralized execution (CTDE) paradigm by maximizing a variational lower bound on mutual information. Experimental results on both simulated and real quadruped robot platforms demonstrate that the proposed method significantly improves task success rates and collaborative stability.

Technology Category

Application Category

📝 Abstract

In this study, we address vision-language-guided multi-robot cooperative transport, where each robot grounds natural-language instructions from onboard camera observations. A key challenge in this decentralized setting is perceptual misalignment across robots, where viewpoint differences and language ambiguity can yield inconsistent interpretations and degrade cooperative transport. To mitigate this problem, we adopt a dependent leader-follower design, where one robot serves as the leader and the other as the follower. Although such a leader-follower structure appears straightforward, learning with independent and symmetric agents often yields symmetric or unstable behaviors without explicit inductive biases. To address this challenge, we propose Consistent Leader-Follower (CoLF), a multi-agent reinforcement learning (MARL) framework for stable leader-follower role differentiation. CoLF consists of two key components: (1) an asymmetric policy design that induces leader-follower role differentiation, and (2) a mutual-information-based training objective that maximizes a variational lower bound, encouraging the follower to predict the leader's action from its local observation. The leader and follower policies are jointly optimized under the centralized training and decentralized execution (CTDE) framework to balance task execution and consistent cooperative behaviors. We validate CoLF in both simulation and real-robot experiments using two quadruped robots. The demonstration video is available at https://sites.google.com/view/colf/.

Problem

Research questions and friction points this paper is trying to address.

multi-robot cooperative transport

vision-language grounding

perceptual misalignment

leader-follower coordination

decentralized execution

Innovation

Methods, ideas, or system contributions that make the work stand out.

leader-follower

multi-agent reinforcement learning

perceptual alignment

mutual information

vision-language grounding

🔎 Similar Papers

No similar papers found.