TrajRAG: Retrieving Geometric-Semantic Experience for Zero-Shot Object Navigation

📅 2026-05-02
📈 Citations: 0
Influential: 0
📄 PDF

career value

183K/year
🤖 AI Summary
This work addresses the limitation of existing zero-shot object navigation methods, which rely on static internet-derived commonsense knowledge and lack the capacity for continual learning from embodied 3D experiences. To overcome this, we propose TrajRAG, a novel framework that introduces, for the first time, a retrieval-augmented mechanism leveraging geometric-semantic navigation experiences. TrajRAG employs a topological-polar trajectory representation to compactly encode spatial layouts and semantic contexts, and constructs a hierarchical chunk structure to enable coarse-to-fine experience retrieval. Retrieved experiences are then integrated with large language or vision-language models to reason and generate waypoints. The framework supports lifelong learning and efficiently reuses historical navigation data. Extensive experiments on MP3D, HM3D-v1, and HM3D-v2 demonstrate significant improvements in zero-shot navigation performance, validating the effectiveness of TrajRAG in experience retrieval and decision-making.
📝 Abstract
Existing zero-shot Object Goal Navigation (ObjectNav) methods often exploit commonsense knowledge from large language or vision-language models to guide navigation. However, such knowledge arises from internet-scale text rather than embodied 3D experience, and episodic observations collected during navigation are typically discarded, preventing the accumulation of lifelong experience. To this end, we propose Trajectory RAG (TrajRAG), a retrieval-augmented generation framework that enhances large-model reasoning by retrieving geometric-semantic experiences. TrajRAG incrementally accumulates episodic observations from past navigation episodes. To structure these observations, we propose a topological-polar (topo-polar) trajectory representation that compactly encodes spatial layouts and semantic contexts, effectively removing redundancies in raw episodic observations. A hierarchical chunking structure further organizes similar topo-polar trajectories into unified summaries, enabling coarse-to-fine retrieval. During navigation, candidate frontiers generate multiple trajectory hypotheses that query TrajRAG for similar past trajectories, guiding large-model reasoning for waypoint selection. New experiences are continually consolidated into TrajRAG, enabling the accumulation of lifelong navigation experience. Experiments on MP3D, HM3D-v1, and HM3D-v2 show that TrajRAG effectively retrieves relevant geometric-semantic experiences and improves zero-shot ObjectNav performance.
Problem

Research questions and friction points this paper is trying to address.

Zero-Shot Object Navigation
Embodied Experience
Geometric-Semantic Knowledge
Lifelong Learning
Episodic Memory
Innovation

Methods, ideas, or system contributions that make the work stand out.

Trajectory RAG
geometric-semantic experience
topo-polar representation
zero-shot ObjectNav
lifelong navigation
Yiyao Wang
Yiyao Wang
State Key Lab of CAD&CG, Zhejiang University
visualization
S
Sixian Zhang
State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences, Beijing; University of Chinese Academy of Sciences, Beijing
K
Keming Zhang
State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences, Beijing; University of Chinese Academy of Sciences, Beijing
X
Xinhang Song
State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences, Beijing; University of Chinese Academy of Sciences, Beijing
S
Songjie Du
University of Chinese Academy of Sciences, Beijing; Institute of Computing Technology, Chinese Academy of Sciences, Beijing
Shuqiang Jiang
Shuqiang Jiang
Institute of Computing Technology, Chinese Academy of Sciences
Multimedia AnalysisVisual Understanding and RetrievalMultimodal Intelligence