DeCoNav: Dialog enhanced Long-Horizon Collaborative Vision-Language Navigation

📅 2026-04-14

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This work addresses the challenges of insufficient synchronization and dynamic coordination in long-horizon multi-robot collaborative vision-and-language navigation by proposing a decentralized, dialogue-driven framework. The approach leverages an event-triggered semantic state exchange mechanism to enable dynamic subgoal reallocation and synchronized replanning without relying on a central controller, thereby supporting real-time adaptive collaboration. Key innovations include a fully decentralized architecture, event-triggered inter-agent dialogue, semantic communication, and graph-grounded evaluation. Experimental results on DeCoNavBench demonstrate a 69.2% improvement in simultaneous success rate for dual-robot teams, substantially validating the efficacy of the proposed dynamic coordination mechanism.

Technology Category

Application Category

📝 Abstract

Long-horizon collaborative vision-language navigation (VLN) is critical for multi-robot systems to accomplish complex tasks beyond the capability of a single agent. CoNavBench takes a first step by introducing the first collaborative long-horizon VLN benchmark with relay-style multi-robot tasks, a collaboration taxonomy, along with graph-grounded generation and evaluation to model handoffs and rendezvous in shared environments. However, existing benchmarks and evaluations often do not enforce strictly synchronized dual-robot rollout on a shared world timeline, and they typically rely on static coordination policies that cannot adapt when new cross-agent evidence emerges. We present Dialog enhanced Long-Horizon Collaborative Vision-Language Navigation (DeCoNav), a decentralized framework that couples event-triggered dialogue with dynamic task allocation and replanning for real-time, adaptive coordination. In DeCoNav, robots exchange compact semantic states via dialogue without a central controller. When informative events such as new evidence, uncertainty, or conflicts arise, dialogue is triggered to dynamically reassign subgoals and replan under synchronized execution. Implemented in DeCoNavBench with 1,213 tasks across 176 HM3D scenes, DeCoNav improves the both-success rate (BSR) by 69.2%, demonstrating the effectiveness of dialogue-driven, dynamically reallocated planning for multi-robot collaboration.

Problem

Research questions and friction points this paper is trying to address.

collaborative vision-language navigation

long-horizon navigation

multi-robot coordination

dynamic task allocation

synchronized execution

Innovation

Methods, ideas, or system contributions that make the work stand out.

dialogue-driven coordination

dynamic task reallocation

event-triggered dialogue