ComposableNav: Instruction-Following Navigation in Dynamic Environments via Composable Diffusion

📅 2025-09-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Natural language navigation in dynamic environments suffers from poor generalization due to combinatorial explosion of instruction variations. Method: This paper proposes a Composable Diffusion Framework that decomposes multi-scale navigation instructions into independent motion primitives and synthesizes them via parallel diffusion models, enabling primitive-level compositionality and zero-shot combinatorial generalization. A two-stage training strategy—supervised pretraining followed by reinforcement learning fine-tuning—is employed to eliminate reliance on per-primitive demonstration data. Contribution/Results: Evaluated on both simulation and real-robot platforms, the method achieves significantly higher accuracy and robustness than VLM- and cost-map-based baselines on unseen instruction combinations, demonstrating flexible, high-precision navigation control under complex, dynamic conditions.

Technology Category

Application Category

📝 Abstract
This paper considers the problem of enabling robots to navigate dynamic environments while following instructions. The challenge lies in the combinatorial nature of instruction specifications: each instruction can include multiple specifications, and the number of possible specification combinations grows exponentially as the robot's skill set expands. For example, "overtake the pedestrian while staying on the right side of the road" consists of two specifications: "overtake the pedestrian" and "walk on the right side of the road." To tackle this challenge, we propose ComposableNav, based on the intuition that following an instruction involves independently satisfying its constituent specifications, each corresponding to a distinct motion primitive. Using diffusion models, ComposableNav learns each primitive separately, then composes them in parallel at deployment time to satisfy novel combinations of specifications unseen in training. Additionally, to avoid the onerous need for demonstrations of individual motion primitives, we propose a two-stage training procedure: (1) supervised pre-training to learn a base diffusion model for dynamic navigation, and (2) reinforcement learning fine-tuning that molds the base model into different motion primitives. Through simulation and real-world experiments, we show that ComposableNav enables robots to follow instructions by generating trajectories that satisfy diverse and unseen combinations of specifications, significantly outperforming both non-compositional VLM-based policies and costmap composing baselines. Videos and additional materials can be found on the project page: https://amrl.cs.utexas.edu/ComposableNav/
Problem

Research questions and friction points this paper is trying to address.

Enabling robots to navigate dynamic environments while following complex instructions
Addressing combinatorial explosion of instruction specifications as skills expand
Satisfying novel combinations of motion specifications unseen during training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses composable diffusion models for navigation instruction following
Learns motion primitives separately then composes them in parallel
Employs two-stage training with supervised pre-training and RL fine-tuning
🔎 Similar Papers
No similar papers found.
Z
Zichao Hu
Department of Computer Science, The University of Texas at Austin
C
Chen Tang
Department of Computer Science, The University of Texas at Austin
M
Michael J. Munje
Department of Computer Science, The University of Texas at Austin
Y
Yifeng Zhu
Department of Computer Science, The University of Texas at Austin
Alex Liu
Alex Liu
University of Washington
AI in educationStrategic Teacher EngagementK-12 education policy
Shuijing Liu
Shuijing Liu
Postdoc, The University of Texas at Austin
Robot LearningHuman Robot Interaction
Garrett Warnell
Garrett Warnell
Research Scientist, Army Research Laboratory
Machine LearningRoboticsArtificial Intelligence
P
Peter Stone
Department of Computer Science, The University of Texas at Austin; Sony AI
Joydeep Biswas
Joydeep Biswas
Associate Professor, Computer Science Department, The University of Texas at Austin
RoboticsArtificial IntelligenceMulti Robot SystemsLocalizationMapping