Safe and Interpretable Multimodal Path Planning for Multi-Agent Cooperation

πŸ“… 2026-02-22
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of collision and coordination failure in multi-agent cooperative path planning, which often arises from the difficulty of accurately predicting agents’ intentions. To this end, the authors propose CaPE, a novel approach that integrates vision-language models (VLMs) with model predictive control (MPC) for the first time. CaPE leverages a VLM to generate interpretable path-editing programs from natural language instructions, which are then rigorously validated for safety by a model-based planner, thereby establishing an interpretable mapping from language commands to safe trajectory updates. The method supports open-domain multimodal collaboration and demonstrates plug-and-play capability in both simulated and real-world human-robot and multi-robot scenarios, significantly improving language understanding, response safety, and planning interpretability.

Technology Category

Application Category

πŸ“ Abstract
Successful cooperation among decentralized agents requires each agent to quickly adapt its plan to the behavior of other agents. In scenarios where agents cannot confidently predict one another's intentions and plans, language communication can be crucial for ensuring safety. In this work, we focus on path-level cooperation in which agents must adapt their paths to one another in order to avoid collisions or perform physical collaboration such as joint carrying. In particular, we propose a safe and interpretable multimodal path planning method, CaPE (Code as Path Editor), which generates and updates path plans for an agent based on the environment and language communication from other agents. CaPE leverages a vision-language model (VLM) to synthesize a path editing program verified by a model-based planner, grounding communication to path plan updates in a safe and interpretable way. We evaluate our approach in diverse simulated and real-world scenarios, including multi-robot and human-robot cooperation in autonomous driving, household, and joint carrying tasks. Experimental results demonstrate that CaPE can be integrated into different robotic systems as a plug-and-play module, greatly enhancing a robot's ability to align its plan to language communication from other robots or humans. We also show that the combination of the VLM-based path editing program synthesis and model-based planning safety enables robots to achieve open-ended cooperation while maintaining safety and interpretability.
Problem

Research questions and friction points this paper is trying to address.

multimodal path planning
multi-agent cooperation
language communication
safety
interpretability
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal path planning
vision-language model
code as path editor
safe cooperation
interpretable planning
πŸ”Ž Similar Papers
No similar papers found.
H
Haojun Shi
Johns Hopkins University
S
Suyu Ye
Johns Hopkins University
K
Katherine M. Guerrerio
Johns Hopkins University
J
Jianzhi Shen
Johns Hopkins University
Y
Yifan Yin
Johns Hopkins University
Daniel Khashabi
Daniel Khashabi
Johns Hopkins University
Natural Language ProcessingArtificial IntelligenceMachine Learning
Chien-Ming Huang
Chien-Ming Huang
Johns Hopkins University
Human-Robot InteractionHuman-Computer InteractionSocial Robotics
Tianmin Shu
Tianmin Shu
Assistant Professor, JHU
Artificial IntelligenceCognitive Science