🤖 AI Summary
This work addresses the inefficiency of service robot navigation in unfamiliar environments, which stems from a lack of persistent memory, reliance on single-modality inputs, and short-sighted planning. To overcome these limitations, the authors propose the Semantic Skeleton Memory Graph (SSMG), which anchors topological keypoints to maintain spatially aligned long-term memory, clusters environmental entities into semantic subgraphs, and integrates vision-language models for multimodal goal understanding. Furthermore, a long-horizon planner is introduced to balance belief confidence against traversal cost, optimizing the sequence of target visits to minimize backtracking. Evaluated on both lifelong navigation and standard ObjectNav benchmarks, the proposed method significantly outperforms strong existing baselines, achieving higher success rates and improved path efficiency.
📝 Abstract
Navigating to out-of-sight targets from human instructions in unfamiliar environments is a core capability for service robots. Despite substantial progress, most approaches underutilize reusable, persistent memory, constraining performance in lifelong settings. Many are additionally limited to single-modality inputs and employ myopic greedy policies, which often induce inefficient back-and-forth maneuvers (BFMs). To address such limitations, we introduce SSMG-Nav, a framework for object navigation built on a \textit{Semantic Skeleton Memory Graph} (SSMG) that consolidates past observations into a spatially aligned, persistent memory anchored by topological keypoints (e.g., junctions, room centers). SSMG clusters nearby entities into subgraphs, unifying entity- and space-level semantics to yield a compact set of candidate destinations. To support multimodal targets (images, objects, and text), we integrate a vision-language model (VLM). For each subgraph, a multimodal prompt synthesized from memory guides the VLM to infer a target belief over destinations. A long-horizon planner then trades off this belief against traversability costs to produce a visit sequence that minimizes expected path length, thereby reducing backtracking. Extensive experiments on challenging lifelong benchmarks and standard ObjectNav benchmarks demonstrate that, compared to strong baselines, our method achieves higher success rates and greater path efficiency, validating the effectiveness of SSMG-Nav.