NavSpace: How Navigation Agents Follow Spatial Intelligence Instructions

📅 2025-10-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing navigation benchmarks predominantly emphasize semantic understanding, lacking systematic evaluation of spatial perception and reasoning capabilities. To address this gap, we propose NavSpace—a novel benchmark comprising six categories of spatial reasoning tasks and 1,228 trajectory-instruction pairs—establishing the first comprehensive evaluation framework for spatial intelligence in embodied navigation. Concurrently, we introduce SNav, a new spatially intelligent navigation model that integrates multimodal large language models with an explicit spatial reasoning architecture. Comprehensive evaluation across 22 state-of-the-art navigation agents on NavSpace demonstrates SNav’s superior performance; its generalizability and practicality are further validated on real robotic platforms. This work bridges critical gaps in both the assessment and modeling of spatial intelligence for embodied navigation, uncovering fundamental challenges—including geometric understanding, topological reasoning, and dynamic spatial alignment—that remain central to advancing autonomous navigation systems.

Technology Category

Application Category

📝 Abstract
Instruction-following navigation is a key step toward embodied intelligence. Prior benchmarks mainly focus on semantic understanding but overlook systematically evaluating navigation agents' spatial perception and reasoning capabilities. In this work, we introduce the NavSpace benchmark, which contains six task categories and 1,228 trajectory-instruction pairs designed to probe the spatial intelligence of navigation agents. On this benchmark, we comprehensively evaluate 22 navigation agents, including state-of-the-art navigation models and multimodal large language models. The evaluation results lift the veil on spatial intelligence in embodied navigation. Furthermore, we propose SNav, a new spatially intelligent navigation model. SNav outperforms existing navigation agents on NavSpace and real robot tests, establishing a strong baseline for future work.
Problem

Research questions and friction points this paper is trying to address.

Evaluating navigation agents' spatial perception and reasoning capabilities
Introducing a benchmark to probe spatial intelligence in navigation
Developing a model that outperforms existing agents on spatial tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces NavSpace benchmark for spatial intelligence
Proposes SNav model for enhanced navigation performance
Evaluates 22 agents on spatial perception capabilities
🔎 Similar Papers
No similar papers found.
Haolin Yang
Haolin Yang
University of Chicago
large language modelsnatural language processing
Yuxing Long
Yuxing Long
Peking University
Embodied Intelligence
Z
Zhuoyuan Yu
CFCS, School of Computer Science, Peking University
Z
Zihan Yang
CFCS, School of Computer Science, Peking University
M
Minghan Wang
CFCS, School of Computer Science, Peking University
J
Jiapeng Xu
CFCS, School of Computer Science, Peking University
Y
Yihan Wang
CFCS, School of Computer Science, Peking University
Z
Ziyan Yu
CFCS, School of Computer Science, Peking University
Wenzhe Cai
Wenzhe Cai
Shanghai AI Laboratory
Reinforcement LearningVisual NavigationRobotics
L
Lei Kang
CFCS, School of Computer Science, Peking University
H
Hao Dong
CFCS, School of Computer Science, Peking University