Sentinel: Embodied Cooperative Spatial Reasoning and Planning

📅 2026-05-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of enabling decentralized embodied agents to safely and efficiently rendezvous through natural language communication in city-scale dynamic environments. The authors propose CoSaR, a novel framework that formalizes embodied collaborative spatial intelligence for the first time. CoSaR integrates the high-level reasoning and communicative capabilities of large language models with dynamic spatial constraint modeling and classical navigation algorithms, facilitating environment-aware information exchange, collaborative spatial reasoning, and real-time path replanning among multiple agents. Evaluated on the newly introduced city-scale Sentinel Challenge benchmark across 14 real-world urban scenes, CoSaR demonstrates significant improvements in rendezvous efficiency, reduced path length, and enhanced obstacle-avoidance safety, underscoring the effectiveness of tightly coupling linguistic communication with spatial reasoning.
📝 Abstract
In this work, we study Cooperative Spatial Intelligence, the ability of decentralized embodied agents to coordinate effectively under dynamic environmental constraints across city-scale outdoor domains. We introduce Sentinel Challenge, a benchmark where multiple decentralized embodied agents must communicate in natural language to agree on a mutually safe and convenient meeting point within large, city-scale outdoor environments. Each agent must then navigate safely while avoiding dynamic sentinels patrolling the area, using a tool that provides coarse spatial information. To address this, we propose CoSaR (Cooperative Spatial Reasoning and Planning), a framework that bridges the high-level communication and planning abilities of foundation models with the precision of classical spatial navigation algorithms. CoSaR enables agents to exchange situational updates, reason over evolving spatial constraints, and collaboratively replan trajectories. Evaluated across 14 city-level scenes with 3-5 agents, CoSaR consistently leads to faster gathering, shorter path lengths, and improved safety. Our results demonstrate that integrating dynamic communication with spatial reasoning is essential for robust multi-agent cooperation. By formalizing this new setting and providing a scalable benchmark, we aim to build a foundation for advancing cooperative spatial intelligence in embodied multi-agent systems. Code and challenge are available at https://github.com/UMass-Embodied-AGI/Sentinel.
Problem

Research questions and friction points this paper is trying to address.

Cooperative Spatial Intelligence
Embodied Agents
Dynamic Environmental Constraints
Natural Language Communication
Multi-agent Coordination
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cooperative Spatial Intelligence
Embodied Agents
CoSaR
Dynamic Communication
Spatial Reasoning
🔎 Similar Papers
No similar papers found.