Image-Goal Navigation Using Refined Feature Guidance and Scene Graph Enhancement

📅 2025-03-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenge of modeling fine-grained associations among targets, observations, and environments under limited data in vision-based object navigation, this paper proposes a lightweight end-to-end navigation framework. Our method introduces three key contributions: (1) a novel spatial-channel joint attention mechanism that jointly models multi-scale spatial structures and channel-wise semantic relationships; (2) a self-distillation-based feature enhancement strategy to improve representation robustness in few-shot settings; and (3) an image-level scene graph that unifies scene-level contextual understanding with object-level topological relations. Evaluated on cross-scene benchmarks—Gibson and HM3D—the framework achieves state-of-the-art performance while maintaining real-time inference speed of 53.5 FPS on an RTX 3080 GPU. The source code and pretrained models are publicly released.

Technology Category

Application Category

📝 Abstract
In this paper, we introduce a novel image-goal navigation approach, named RFSG. Our focus lies in leveraging the fine-grained connections between goals, observations, and the environment within limited image data, all the while keeping the navigation architecture simple and lightweight. To this end, we propose the spatial-channel attention mechanism, enabling the network to learn the importance of multi-dimensional features to fuse the goal and observation features. In addition, a selfdistillation mechanism is incorporated to further enhance the feature representation capabilities. Given that the navigation task needs surrounding environmental information for more efficient navigation, we propose an image scene graph to establish feature associations at both the image and object levels, effectively encoding the surrounding scene information. Crossscene performance validation was conducted on the Gibson and HM3D datasets, and the proposed method achieved stateof-the-art results among mainstream methods, with a speed of up to 53.5 frames per second on an RTX3080. This contributes to the realization of end-to-end image-goal navigation in realworld scenarios. The implementation and model of our method have been released at: https://github.com/nubot-nudt/RFSG.
Problem

Research questions and friction points this paper is trying to address.

Develops image-goal navigation using refined feature guidance.
Enhances scene understanding with image scene graph.
Achieves efficient real-world navigation with lightweight architecture.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spatial-channel attention mechanism for feature fusion
Self-distillation enhances feature representation capabilities
Image scene graph encodes environmental information effectively
🔎 Similar Papers
No similar papers found.
Z
Zhicheng Feng
College of Intelligence Science and Technology, and the National Key Laboratory of Equipment State Sensing and Smart Support, National University of Defense Technology, China
Xieyuanli Chen
Xieyuanli Chen
Associate Professor, NUDT, China
RoboticsSLAMLocalizationLiDAR PerceptionRobot Learning
C
Chenghao Shi
College of Intelligence Science and Technology, and the National Key Laboratory of Equipment State Sensing and Smart Support, National University of Defense Technology, China
Lun Luo
Lun Luo
Zhejiang University
SLAMPlace Recognition
Z
Zhichao Chen
Jiangxi University of Science and Technology, China
Y
Yun-Hui Liu
T Stone Robotics Institute and Department of Mechanical and Automation Engineering, the Chinese University of Hong Kong, China
Huimin Lu
Huimin Lu
National University of Defense Technology
Robot VisionMulti-robot CoordinationRobot SoccerRobot Rescue