Two-Stage Depth Enhanced Learning with Obstacle Map For Object Navigation

๐Ÿ“… 2024-06-20
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing visual object navigation methods suffer from two key limitations: (1) coupling search and path planning into a single stage with shared reward signals, leading to insufficient training or overfitting; and (2) relying on generic visual encoders that ignore depth and dynamic obstacle information, hindering effective policy learning. To address these issues, we propose a decoupled two-stage navigation framework: (i) a differentiated reward mechanism that separately optimizes target search coverage and path navigation accuracy; (ii) an RGB-D pre-trained depth-aware feature extractor integrated with an online-constructed obstacle map and semantic cues for multimodal state representation; and (iii) end-to-end joint optimization via two-stage reinforcement learning. Evaluated on AI2-Thor and RoboTHOR, our method achieves significant improvements over state-of-the-art approachesโ€”higher success rates, improved path efficiency, and substantially reduced collision and deadlock rates.

Technology Category

Application Category

๐Ÿ“ Abstract
The task that requires an agent to navigate to a given object through only visual observation is called visual object navigation (VON). The main bottlenecks of VON are strategies exploration and prior knowledge exploitation. Traditional strategies exploration ignores the differences of searching and navigating stages, using the same reward in two stages, which reduces navigation performance and training efficiency. Our study enables the agent to explore larger area in searching stage and seek the optimal path in navigating stage, improving the success rate of navigation. Traditional prior knowledge exploitation focused on learning and utilizing object association, which ignored the depth and obstacle information in the environment. This paper uses the RGB and depth information of the training scene to pretrain the feature extractor, which improves navigation efficiency. The obstacle information is memorized by the agent during the navigation, reducing the probability of collision and deadlock. Depth, obstacle and other prior knowledge are concatenated and input into the policy network, and navigation actions are output under the training of two-stage rewards. We evaluated our method on AI2-Thor and RoboTHOR and demonstrated that it significantly outperforms state-of-the-art (SOTA) methods on success rate and navigation efficiency.
Problem

Research questions and friction points this paper is trying to address.

Decoupling search and pathfinding in object navigation tasks
Improving spatial perception with depth-enhanced visual encoding
Assessing search efficiency with a new weighted metric
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-Stage Reward Mechanism decouples searching and pathfinding
Depth Enhanced Masked Autoencoders improve spatial perception
New metric SSSPL evaluates searching ability and efficiency
๐Ÿ”Ž Similar Papers
No similar papers found.
Y
Yanwei Zheng
School of Computer Science and Technology, Shandong University, Qingdao 266237, China
S
Shaopu Feng
School of Computer Science and Technology, Shandong University, Qingdao 266237, China
Bowen Huang
Bowen Huang
Electrical Engineer, Optimization and Control, PNNL
Control theoryTransfer operator theoryPower system analysis
C
Changrui Li
School of Computer Science and Technology, Shandong University, Qingdao 266237, China
X
Xiao Zhang
School of Computer Science and Technology, Shandong University, Qingdao 266237, China
Dongxiao Yu
Dongxiao Yu
Professor of Computer Science, Shandong University
Distributed ComputingWireless NetworkingGraph Algorithms