SSMG-Nav: Enhancing Lifelong Object Navigation with Semantic Skeleton Memory Graph

📅 2026-03-02

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This work addresses the inefficiency of service robot navigation in unfamiliar environments, which stems from a lack of persistent memory, reliance on single-modality inputs, and short-sighted planning. To overcome these limitations, the authors propose the Semantic Skeleton Memory Graph (SSMG), which anchors topological keypoints to maintain spatially aligned long-term memory, clusters environmental entities into semantic subgraphs, and integrates vision-language models for multimodal goal understanding. Furthermore, a long-horizon planner is introduced to balance belief confidence against traversal cost, optimizing the sequence of target visits to minimize backtracking. Evaluated on both lifelong navigation and standard ObjectNav benchmarks, the proposed method significantly outperforms strong existing baselines, achieving higher success rates and improved path efficiency.

Technology Category

Application Category

📝 Abstract

Navigating to out-of-sight targets from human instructions in unfamiliar environments is a core capability for service robots. Despite substantial progress, most approaches underutilize reusable, persistent memory, constraining performance in lifelong settings. Many are additionally limited to single-modality inputs and employ myopic greedy policies, which often induce inefficient back-and-forth maneuvers (BFMs). To address such limitations, we introduce SSMG-Nav, a framework for object navigation built on a \textit{Semantic Skeleton Memory Graph} (SSMG) that consolidates past observations into a spatially aligned, persistent memory anchored by topological keypoints (e.g., junctions, room centers). SSMG clusters nearby entities into subgraphs, unifying entity- and space-level semantics to yield a compact set of candidate destinations. To support multimodal targets (images, objects, and text), we integrate a vision-language model (VLM). For each subgraph, a multimodal prompt synthesized from memory guides the VLM to infer a target belief over destinations. A long-horizon planner then trades off this belief against traversability costs to produce a visit sequence that minimizes expected path length, thereby reducing backtracking. Extensive experiments on challenging lifelong benchmarks and standard ObjectNav benchmarks demonstrate that, compared to strong baselines, our method achieves higher success rates and greater path efficiency, validating the effectiveness of SSMG-Nav.

Problem

Research questions and friction points this paper is trying to address.

lifelong object navigation

persistent memory

multimodal inputs

back-and-forth maneuvers

out-of-sight targets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic Skeleton Memory Graph

Lifelong Object Navigation

Multimodal Target Grounding