🤖 AI Summary
Existing reinforcement learning–based autonomous exploration methods exhibit weak global reasoning capabilities in graph-structured environments and neglect motion dynamics constraints, leading to policies that optimize path length at the expense of temporal efficiency. This work proposes a graph transformer–enhanced deep reinforcement learning framework: first, a graph transformer models the environmental information graph to capture long-range structural dependencies; second, motion-smoothness priors are embedded into the policy network, and Kalman filtering is applied post-hoc to waypoints to ensure dynamical feasibility. The framework jointly optimizes exploration coverage, traversal distance, and time cost. Experiments across multiple simulated environments demonstrate average reductions of 21.5% in exploration distance and 21.3% in time consumption. The method further validates planning robustness and real-time performance on a physical robot platform.
📝 Abstract
Autonomous robot exploration (ARE) is the process of a robot autonomously navigating and mapping an unknown environment. Recent Reinforcement Learning (RL)-based approaches typically formulate ARE as a sequential decision-making problem defined on a collision-free informative graph. However, these methods often demonstrate limited reasoning ability over graph-structured data. Moreover, due to the insufficient consideration of robot motion, the resulting RL policies are generally optimized to minimize travel distance, while neglecting time efficiency. To overcome these limitations, we propose GRATE, a Deep Reinforcement Learning (DRL)-based approach that leverages a Graph Transformer to effectively capture both local structure patterns and global contextual dependencies of the informative graph, thereby enhancing the model's reasoning capability across the entire environment. In addition, we deploy a Kalman filter to smooth the waypoint outputs, ensuring that the resulting path is kinodynamically feasible for the robot to follow. Experimental results demonstrate that our method exhibits better exploration efficiency (up to 21.5% in distance and 21.3% in time to complete exploration) than state-of-the-art conventional and learning-based baselines in various simulation benchmarks. We also validate our planner in real-world scenarios.