Multi-Robot Motion Planning from Vision and Language using Heat-Inspired Diffusion

📅 2025-12-15

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

To address key bottlenecks in multi-robot language-conditioned motion planning—namely, poor generalization of diffusion models, high inference overhead, and reliance on explicit environmental modeling and geometric reachability priors—this paper proposes LCHD, an end-to-end vision-driven framework. LCHD eliminates conventional obstacle inputs and explicit environment representations, directly processing RGB images and natural language instructions to generate collision-free trajectories. Its core innovation lies in integrating a heat-equation-inspired diffusion kernel as a physics-informed prior, tightly coupled with CLIP-based semantic encoding, enabling reachability-aware language understanding and robust out-of-distribution generalization. Evaluated across diverse real-world maps and physical robot platforms, LCHD achieves significantly higher task success rates, reduces inference latency by an order of magnitude, and operates entirely without runtime obstacle information.

Technology Category

Application Category

📝 Abstract

Diffusion models have recently emerged as powerful tools for robot motion planning by capturing the multi-modal distribution of feasible trajectories. However, their extension to multi-robot settings with flexible, language-conditioned task specifications remains limited. Furthermore, current diffusion-based approaches incur high computational cost during inference and struggle with generalization because they require explicit construction of environment representations and lack mechanisms for reasoning about geometric reachability. To address these limitations, we present Language-Conditioned Heat-Inspired Diffusion (LCHD), an end-to-end vision-based framework that generates language-conditioned, collision-free trajectories. LCHD integrates CLIP-based semantic priors with a collision-avoiding diffusion kernel serving as a physical inductive bias that enables the planner to interpret language commands strictly within the reachable workspace. This naturally handles out-of-distribution scenarios -- in terms of reachability -- by guiding robots toward accessible alternatives that match the semantic intent, while eliminating the need for explicit obstacle information at inference time. Extensive evaluations on diverse real-world-inspired maps, along with real-robot experiments, show that LCHD consistently outperforms prior diffusion-based planners in success rate, while reducing planning latency.

Problem

Research questions and friction points this paper is trying to address.

Multi-robot motion planning with language-conditioned tasks

Reducing computational cost and improving generalization in diffusion models

Generating collision-free trajectories using vision and semantic priors

Innovation

Methods, ideas, or system contributions that make the work stand out.

Language-Conditioned Heat-Inspired Diffusion for vision-based planning

Integrates CLIP semantic priors with collision-avoiding diffusion kernel

Eliminates need for explicit obstacle information at inference

🔎 Similar Papers

Multi-Robot Motion Planning with Diffusion Models

2024-10-04arXiv.orgCitations: 4

Language-Guided Manipulation with Diffusion Policies and Constrained Inpainting

2024-06-14arXiv.orgCitations: 9