🤖 AI Summary
To address low sample efficiency, poor generalization, and challenging sim-to-real transfer for wheeled robots navigating vertically challenging terrains, this paper proposes Verti-Selector, an automated curriculum learning framework. The method integrates reinforcement learning with the Chrono multi-physics engine (VW-Chrono) for high-fidelity simulation and is validated on the real-world Verti-4-Wheeler platform. Its core contribution is a novel TD-error-driven terrain-adaptive sampling mechanism that dynamically prioritizes regions with high temporal-difference error, enabling capability-boundary-guided, progressive curriculum generation without manual environment sequencing. Experimental results demonstrate a 23.08% improvement in task success rate in both simulation and physical deployment, alongside significantly enhanced cross-domain transfer robustness and real-world deployability.
📝 Abstract
Reinforcement Learning (RL) has the potential to enable extreme off-road mobility by circumventing complex kinodynamic modeling, planning, and control by simulated end-to-end trial-and-error learning experiences. However, most RL methods are sample-inefficient when training in a large amount of manually designed simulation environments and struggle at generalizing to the real world. To address these issues, we introduce VertiSelector (VS), an automatic curriculum learning framework designed to enhance learning efficiency and generalization by selectively sampling training terrain. VS prioritizes vertically challenging terrain with higher Temporal Difference (TD) errors when revisited, thereby allowing robots to learn at the edge of their evolving capabilities. By dynamically adjusting the sampling focus, VS significantly boosts sample efficiency and generalization within the VW-Chrono simulator built on the Chrono multi-physics engine. Furthermore, we provide simulation and physical results using VS on a Verti-4-Wheeler platform. These results demonstrate that VS can achieve 23.08% improvement in terms of success rate by efficiently sampling during training and robustly generalizing to the real world.