Embodied Escaping: End-to-End Reinforcement Learning for Robot Navigation in Narrow Environment

📅 2025-03-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of vacuum cleaning robots becoming trapped in narrow, cluttered indoor environments and struggling to autonomously recover, this paper proposes an end-to-end reinforcement learning navigation framework. The method integrates multi-sensor perception—including LiDAR, IMU, and wheel encoders—and employs Proximal Policy Optimization (PPO) for policy learning. Key contributions include: (1) a reparameterized action space based on a unified turning radius to enhance maneuverability; (2) a dynamic action masking mechanism that balances decision accuracy and real-time responsiveness; and (3) a hybrid reward training paradigm tailored for sparse-reward navigation tasks. Evaluated across multi-level real-world scenarios, the framework achieves significantly higher escape success rates than both classical planning-based and state-of-the-art RL approaches—reducing collision frequency by 62% and shortening average escape time by 41%.

Technology Category

Application Category

📝 Abstract
Autonomous navigation is a fundamental task for robot vacuum cleaners in indoor environments. Since their core function is to clean entire areas, robots inevitably encounter dead zones in cluttered and narrow scenarios. Existing planning methods often fail to escape due to complex environmental constraints, high-dimensional search spaces, and high difficulty maneuvers. To address these challenges, this paper proposes an embodied escaping model that leverages reinforcement learning-based policy with an efficient action mask for dead zone escaping. To alleviate the issue of the sparse reward in training, we introduce a hybrid training policy that improves learning efficiency. In handling redundant and ineffective action options, we design a novel action representation to reshape the discrete action space with a uniform turning radius. Furthermore, we develop an action mask strategy to select valid action quickly, balancing precision and efficiency. In real-world experiments, our robot is equipped with a Lidar, IMU, and two-wheel encoders. Extensive quantitative and qualitative experiments across varying difficulty levels demonstrate that our robot can consistently escape from challenging dead zones. Moreover, our approach significantly outperforms compared path planning and reinforcement learning methods in terms of success rate and collision avoidance.
Problem

Research questions and friction points this paper is trying to address.

Autonomous robot navigation in narrow, cluttered environments
Efficient escaping from dead zones using reinforcement learning
Improving learning efficiency with hybrid training and action masks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning-based policy for navigation
Hybrid training policy to enhance learning efficiency
Action mask strategy for quick valid action selection
🔎 Similar Papers
No similar papers found.