DORAEMON: Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation

๐Ÿ“… 2025-05-28
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Zero-shot adaptive navigation for domestic service robots in unknown environments faces challenges including insufficient coordination between path planning and scene understanding, unstructured memory representations, and weak task comprehension. This paper proposes Ventra-Dorsal, a biologically inspired dual-stream cognitive architecture that integrates vision-language models (VLMs) with embodied perceptual memory modeling to enable reliable map-free, pretraining-free navigation. We introduce a novel semantic-spatial hierarchical fusion mechanism and a dynamic topological mapping method, design the Nav-Ensurance safety assurance module, and propose AORIโ€”a new metric for navigation intelligence. Evaluated on HM3D, MP3D, and GOAT datasets, our approach achieves state-of-the-art success rate (SR) and success-weighted path length (SPL), significantly outperforming existing zero-shot methods. Results demonstrate the frameworkโ€™s effectiveness and robustness under zero-shot, map-free, and pretraining-free conditions.

Technology Category

Application Category

๐Ÿ“ Abstract
Adaptive navigation in unfamiliar environments is crucial for household service robots but remains challenging due to the need for both low-level path planning and high-level scene understanding. While recent vision-language model (VLM) based zero-shot approaches reduce dependence on prior maps and scene-specific training data, they face significant limitations: spatiotemporal discontinuity from discrete observations, unstructured memory representations, and insufficient task understanding leading to navigation failures. We propose DORAEMON (Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation), a novel cognitive-inspired framework consisting of Ventral and Dorsal Streams that mimics human navigation capabilities. The Dorsal Stream implements the Hierarchical Semantic-Spatial Fusion and Topology Map to handle spatiotemporal discontinuities, while the Ventral Stream combines RAG-VLM and Policy-VLM to improve decision-making. Our approach also develops Nav-Ensurance to ensure navigation safety and efficiency. We evaluate DORAEMON on the HM3D, MP3D, and GOAT datasets, where it achieves state-of-the-art performance on both success rate (SR) and success weighted by path length (SPL) metrics, significantly outperforming existing methods. We also introduce a new evaluation metric (AORI) to assess navigation intelligence better. Comprehensive experiments demonstrate DORAEMON's effectiveness in zero-shot autonomous navigation without requiring prior map building or pre-training.
Problem

Research questions and friction points this paper is trying to address.

Enables adaptive navigation in unfamiliar environments for robots
Addresses spatiotemporal discontinuity and unstructured memory in navigation
Improves decision-making and safety in zero-shot autonomous navigation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Semantic-Spatial Fusion for navigation
RAG-VLM and Policy-VLM for decision-making
Nav-Ensurance ensures safety and efficiency
๐Ÿ”Ž Similar Papers
No similar papers found.
T
Tianjun Gu
East China Normal University
L
Linfeng Li
East China Normal University
Xuhong Wang
Xuhong Wang
Shanghai Artificial Intelligence Laboratory
LLMKnowledge SystemAI Simulation
Chenghua Gong
Chenghua Gong
University of Science and Technology of China
Graph MiningLarge Language ModelSocial Computing
Jingyu Gong
Jingyu Gong
Shanghai Jiao Tong University
3D Computer Vision
Zhizhong Zhang
Zhizhong Zhang
Associate Researcher, East China Normal University
Computer Vision
Y
Yuan Xie
East China Normal University, Shanghai Innovation Institute
L
Lizhuang Ma
East China Normal University
X
Xin Tan
East China Normal University, Shanghai AI Lab