From reactive to cognitive: brain-inspired spatial intelligence for embodied agents

📅 2025-08-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current multimodal large language models (MLLMs) for embodied agents lack structured spatial memory and exhibit only passive, reactive behavior, severely limiting their generalization and adaptability in complex real-world environments. To address this, we propose BSC-Nav—a brain-inspired unified navigation framework—that introduces the first egocentricity-free, dynamically retrievable cognitive map integrating landmark-, path-, and topology-level spatial knowledge. BSC-Nav employs a self-supervised autoencoder for map construction, context-aware trajectory modeling, and semantic-aligned spatial knowledge retrieval to enable MLLM-driven vision-language spatial reasoning. Evaluated across diverse navigation benchmarks, BSC-Nav achieves state-of-the-art performance, demonstrates strong zero-shot generalization, and supports adaptive, goal-directed behaviors in realistic settings. Our approach establishes a novel, scalable paradigm for embodied spatial intelligence grounded in neurocognitive principles.

Technology Category

Application Category

📝 Abstract
Spatial cognition enables adaptive goal-directed behavior by constructing internal models of space. Robust biological systems consolidate spatial knowledge into three interconnected forms: extit{landmarks} for salient cues, extit{route knowledge} for movement trajectories, and extit{survey knowledge} for map-like representations. While recent advances in multi-modal large language models (MLLMs) have enabled visual-language reasoning in embodied agents, these efforts lack structured spatial memory and instead operate reactively, limiting their generalization and adaptability in complex real-world environments. Here we present Brain-inspired Spatial Cognition for Navigation (BSC-Nav), a unified framework for constructing and leveraging structured spatial memory in embodied agents. BSC-Nav builds allocentric cognitive maps from egocentric trajectories and contextual cues, and dynamically retrieves spatial knowledge aligned with semantic goals. Integrated with powerful MLLMs, BSC-Nav achieves state-of-the-art efficacy and efficiency across diverse navigation tasks, demonstrates strong zero-shot generalization, and supports versatile embodied behaviors in the real physical world, offering a scalable and biologically grounded path toward general-purpose spatial intelligence.
Problem

Research questions and friction points this paper is trying to address.

Develop structured spatial memory for embodied agents
Enable cognitive mapping from egocentric visual trajectories
Achieve zero-shot generalization in navigation tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Builds allocentric cognitive maps from egocentric trajectories
Dynamically retrieves spatial knowledge aligned with goals
Integrates structured spatial memory with multimodal language models
🔎 Similar Papers
No similar papers found.
S
Shouwei Ruan
Institute of Artificial Intelligence, Beihang University, Beijing, China
Liyuan Wang
Liyuan Wang
Tsinghua University
bio-inspired learningcontinual learningneuroscience
Caixin Kang
Caixin Kang
The University of Tokyo
Computer VisionTrustworthy AIAutonomous DrivingGenerative Models
Q
Qihui Zhu
Institute of Artificial Intelligence, Beihang University, Beijing, China
Songming Liu
Songming Liu
PhD of Computer Science, Tsinghua University
AImaching learningroboticsphysics
Xingxing Wei
Xingxing Wei
Professor of Artificial Intelligence, Beihang University
Computer visionAdversarial machine learning
H
Hang Su
Department of Computer Science and Technology, Institute for AI, BNRist Center, Tsinghua-Bosch Joint ML Center, THBI Lab, Tsinghua University, Beijing, China