Embodied Multi-Agent Coordination by Aligning World Models Through Dialogue

πŸ“… 2026-05-12
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

242K/year
πŸ€– AI Summary
In partially observable environments, embodied multi-agent systems struggle to achieve deep coordination through actions alone and require communication to align their world models. This work introduces a natural language dialogue channel into the PARTNR benchmark, constructing a multi-agent system that integrates large language models, embodied architectures, and partial observability. To distinguish superficial coordination from genuine model alignment, the authors propose an evaluation framework based on individual world graphs, comprising three metrics: observational convergence, informational novelty, and belief sensitivity. Experimental results show that the dialogue mechanism reduces action conflicts by 40–83 percentage points; however, task success rates remain lower than those achieved through silent coordination, indicating that current systems have not yet attained true alignment of world models.
πŸ“ Abstract
Effective collaboration between embodied agents requires more than acting in a shared environment; it demands communication grounded in each agent's evolving understanding of the world. When agents can only partially observe their surroundings, coordination without communication is provably hard, but communication can, in principle, bridge this gap by allowing agents to share observations and align their world models. In this work, we examine whether LLM-based embodied agents actually realize the ability to communicate. We extend PARTNR, a benchmark for collaborative household robotics, with a natural-language dialogue channel that enables two agents with partial observability to communicate during task execution. To evaluate whether dialogue leads to genuine world-model alignment rather than superficial coordination, we propose a framework for measuring world-model alignment defined over per-agent world graphs: observation convergence (do private world models align over time?), information novelty (do messages convey what the partner lacks?), and belief-sensitive messaging (do agents model what their partner knows?). Our experiments across three LLMs reveal that dialogue reduces action conflicts 40 to 83 percentage points but degrades task success relative to silent coordination. Using our metrics, we characterize the gap between superficial coordination and genuine world-model alignment, and identify where current models fall on this spectrum.
Problem

Research questions and friction points this paper is trying to address.

embodied agents
world model alignment
multi-agent coordination
partial observability
dialogue
Innovation

Methods, ideas, or system contributions that make the work stand out.

world-model alignment
embodied multi-agent coordination
natural-language dialogue
partial observability
LLM-based agents
πŸ”Ž Similar Papers