Multi-Robot Motion Planning from Vision and Language using Heat-Inspired Diffusion

📅 2025-12-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address key bottlenecks in multi-robot language-conditioned motion planning—namely, poor generalization of diffusion models, high inference overhead, and reliance on explicit environmental modeling and geometric reachability priors—this paper proposes LCHD, an end-to-end vision-driven framework. LCHD eliminates conventional obstacle inputs and explicit environment representations, directly processing RGB images and natural language instructions to generate collision-free trajectories. Its core innovation lies in integrating a heat-equation-inspired diffusion kernel as a physics-informed prior, tightly coupled with CLIP-based semantic encoding, enabling reachability-aware language understanding and robust out-of-distribution generalization. Evaluated across diverse real-world maps and physical robot platforms, LCHD achieves significantly higher task success rates, reduces inference latency by an order of magnitude, and operates entirely without runtime obstacle information.

Technology Category

Application Category

📝 Abstract
Diffusion models have recently emerged as powerful tools for robot motion planning by capturing the multi-modal distribution of feasible trajectories. However, their extension to multi-robot settings with flexible, language-conditioned task specifications remains limited. Furthermore, current diffusion-based approaches incur high computational cost during inference and struggle with generalization because they require explicit construction of environment representations and lack mechanisms for reasoning about geometric reachability. To address these limitations, we present Language-Conditioned Heat-Inspired Diffusion (LCHD), an end-to-end vision-based framework that generates language-conditioned, collision-free trajectories. LCHD integrates CLIP-based semantic priors with a collision-avoiding diffusion kernel serving as a physical inductive bias that enables the planner to interpret language commands strictly within the reachable workspace. This naturally handles out-of-distribution scenarios -- in terms of reachability -- by guiding robots toward accessible alternatives that match the semantic intent, while eliminating the need for explicit obstacle information at inference time. Extensive evaluations on diverse real-world-inspired maps, along with real-robot experiments, show that LCHD consistently outperforms prior diffusion-based planners in success rate, while reducing planning latency.
Problem

Research questions and friction points this paper is trying to address.

Multi-robot motion planning with language-conditioned tasks
Reducing computational cost and improving generalization in diffusion models
Generating collision-free trajectories using vision and semantic priors
Innovation

Methods, ideas, or system contributions that make the work stand out.

Language-Conditioned Heat-Inspired Diffusion for vision-based planning
Integrates CLIP semantic priors with collision-avoiding diffusion kernel
Eliminates need for explicit obstacle information at inference
J
Jebeom Chae
Yonsei University, Department of Artificial Intelligence
J
Junwoo Chang
Yonsei University, School of Mechanical Engineering
S
Seungho Yeom
Yonsei University, School of Mechanical Engineering
Y
Yujin Kim
Yonsei University, School of Mechanical Engineering
Jongeun Choi
Jongeun Choi
Professor of Mechanical Engineering, Yonsei University
Machine LearningRobot LearningSystems and ControlAI in Healthcare