🤖 AI Summary
This work addresses the challenges of multi-agent collaboration in complex dynamic environments, particularly in task allocation, coordinated execution, and human–robot interaction. To advance this field, the authors launched the MARS Challenge—the first systematic initiative at a NeurIPS Workshop to promote collaborative planning and control in embodied multi-agent AI. The proposed approach integrates vision–language models (VLMs) for high-level task planning with policy execution techniques to enable robotic manipulation. Evaluation of competition submissions yielded critical insights into scalable architectures, efficient coordination mechanisms, and advanced human–agent interaction for embodied multi-agent systems, significantly advancing the development of collaborative AI.
📝 Abstract
Recent advancements in multimodal large language models and vision-languageaction models have significantly driven progress in Embodied AI. As the field transitions toward more complex task scenarios, multi-agent system frameworks are becoming essential for achieving scalable, efficient, and collaborative solutions. This shift is fueled by three primary factors: increasing agent capabilities, enhancing system efficiency through task delegation, and enabling advanced human-agent interactions. To address the challenges posed by multi-agent collaboration, we propose the Multi-Agent Robotic System (MARS) Challenge, held at the NeurIPS 2025 Workshop on SpaVLE. The competition focuses on two critical areas: planning and control, where participants explore multi-agent embodied planning using vision-language models (VLMs) to coordinate tasks and policy execution to perform robotic manipulation in dynamic environments. By evaluating solutions submitted by participants, the challenge provides valuable insights into the design and coordination of embodied multi-agent systems, contributing to the future development of advanced collaborative AI systems.