From reactive to cognitive: brain-inspired spatial intelligence for embodied agents

📅 2025-08-23

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

Current multimodal large language models (MLLMs) for embodied agents lack structured spatial memory and exhibit only passive, reactive behavior, severely limiting their generalization and adaptability in complex real-world environments. To address this, we propose BSC-Nav—a brain-inspired unified navigation framework—that introduces the first egocentricity-free, dynamically retrievable cognitive map integrating landmark-, path-, and topology-level spatial knowledge. BSC-Nav employs a self-supervised autoencoder for map construction, context-aware trajectory modeling, and semantic-aligned spatial knowledge retrieval to enable MLLM-driven vision-language spatial reasoning. Evaluated across diverse navigation benchmarks, BSC-Nav achieves state-of-the-art performance, demonstrates strong zero-shot generalization, and supports adaptive, goal-directed behaviors in realistic settings. Our approach establishes a novel, scalable paradigm for embodied spatial intelligence grounded in neurocognitive principles.

Technology Category

Application Category

📝 Abstract

Spatial cognition enables adaptive goal-directed behavior by constructing internal models of space. Robust biological systems consolidate spatial knowledge into three interconnected forms: extit{landmarks} for salient cues, extit{route knowledge} for movement trajectories, and extit{survey knowledge} for map-like representations. While recent advances in multi-modal large language models (MLLMs) have enabled visual-language reasoning in embodied agents, these efforts lack structured spatial memory and instead operate reactively, limiting their generalization and adaptability in complex real-world environments. Here we present Brain-inspired Spatial Cognition for Navigation (BSC-Nav), a unified framework for constructing and leveraging structured spatial memory in embodied agents. BSC-Nav builds allocentric cognitive maps from egocentric trajectories and contextual cues, and dynamically retrieves spatial knowledge aligned with semantic goals. Integrated with powerful MLLMs, BSC-Nav achieves state-of-the-art efficacy and efficiency across diverse navigation tasks, demonstrates strong zero-shot generalization, and supports versatile embodied behaviors in the real physical world, offering a scalable and biologically grounded path toward general-purpose spatial intelligence.

Problem

Research questions and friction points this paper is trying to address.

Develop structured spatial memory for embodied agents

Enable cognitive mapping from egocentric visual trajectories

Achieve zero-shot generalization in navigation tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Builds allocentric cognitive maps from egocentric trajectories

Dynamically retrieves spatial knowledge aligned with goals

Integrates structured spatial memory with multimodal language models

🔎 Similar Papers

No similar papers found.