🤖 AI Summary
To address low exploration efficiency and poor scalability of autonomous agents in large-scale environments, this paper proposes an attention-based hierarchical reinforcement learning framework. Methodologically: (1) it constructs an incremental, shape-adaptive hierarchical graph structure to enable multi-scale spatial reasoning; (2) it devises a community-aware global graph update algorithm with linear time complexity; and (3) it introduces a parameter-free privileged reward mechanism to eliminate reward shaping bias and guide near-optimal exploration policies. The approach integrates hierarchical graph representation learning, incremental map updating, and multi-scale belief inference. Experiments demonstrate a 20% improvement in exploration efficiency over state-of-the-art methods in large-scale simulations. Furthermore, the framework has been successfully deployed in a real-world campus environment measuring 300 m × 230 m, validating its efficiency, scalability, and practical applicability.
📝 Abstract
This work pushes the boundaries of learning-based methods in autonomous robot exploration in terms of environmental scale and exploration efficiency. We present HEADER, an attention-based reinforcement learning approach with hierarchical graphs for efficient exploration in large-scale environments. HEADER follows existing conventional methods to construct hierarchical representations for the robot belief/map, but further designs a novel community-based algorithm to construct and update a global graph, which remains fully incremental, shape-adaptive, and operates with linear complexity. Building upon attention-based networks, our planner finely reasons about the nearby belief within the local range while coarsely leveraging distant information at the global scale, enabling next-best-viewpoint decisions that consider multi-scale spatial dependencies. Beyond novel map representation, we introduce a parameter-free privileged reward that significantly improves model performance and produces near-optimal exploration behaviors, by avoiding training objective bias caused by handcrafted reward shaping. In simulated challenging, large-scale exploration scenarios, HEADER demonstrates better scalability than most existing learning and non-learning methods, while achieving a significant improvement in exploration efficiency (up to 20%) over state-of-the-art baselines. We also deploy HEADER on hardware and validate it in complex, large-scale real-life scenarios, including a 300m*230m campus environment.