🤖 AI Summary
This work addresses the challenge of navigation in large-scale, visually complex, and reward-sparse environments. We propose a deep reinforcement learning method that jointly leverages an explicit topological map and object-centric macro-actions. The agent constructs a lightweight topological map from RGB-D observations and defines high-level semantic macro-actions anchored on salient objects, enabling policy abstraction and cross-scene generalization. To our knowledge, this is the first approach to co-model object-level macro-actions with an explicit topological structure, significantly improving sample efficiency and policy interpretability. Evaluated in a photorealistic 3D rasterized simulation environment, our method substantially outperforms a random baseline in task success rate, while supporting both dense (immediate) and sparse (terminal) reward settings. Experimental results demonstrate that integrating topological priors with macro-action abstraction effectively alleviates key bottlenecks of pixel-level end-to-end learning.
📝 Abstract
This paper addresses the challenge of navigation in large, visually complex environments with sparse rewards. We propose a method that uses object-oriented macro actions grounded in a topological map, allowing a simple Deep Q-Network (DQN) to learn effective navigation policies. The agent builds a map by detecting objects from RGBD input and selecting discrete macro actions that correspond to navigating to these objects. This abstraction drastically reduces the complexity of the underlying reinforcement learning problem and enables generalization to unseen environments. We evaluate our approach in a photorealistic 3D simulation and show that it significantly outperforms a random baseline under both immediate and terminal reward conditions. Our results demonstrate that topological structure and macro-level abstraction can enable sample-efficient learning even from pixel data.