🤖 AI Summary
Unreliable dynamic waypoint tracking of wheeled rovers on distant planetary unstructured granular terrains arises from the sim-to-real gap in dynamics modeling.
Method: This paper proposes an end-to-end simulation-to-reality (Sim2Real) transfer framework integrating procedurally generated environmental diversity, large-scale parallel reinforcement learning with physics parameter randomization, action smoothing filtering, and zero-shot transfer—eliminating reliance on high-fidelity particle-based physics fine-tuning.
Contribution/Results: We introduce a novel training paradigm explicitly designed for robust interaction with granular media, significantly enhancing cross-domain policy generalization. Experimental validation on a lunar-analog physical terrain platform demonstrates high-performance dynamic navigation across multiple RL algorithms. Zero-shot Sim2Real transfer outperforms conventional static training and particle-physics fine-tuning, confirming the method’s effectiveness, scalability, and engineering applicability.
📝 Abstract
Reliable autonomous navigation across the unstructured terrains of distant planetary surfaces is a critical enabler for future space exploration. However, the deployment of learning-based controllers is hindered by the inherent sim-to-real gap, particularly for the complex dynamics of wheel interactions with granular media. This work presents a complete sim-to-real framework for developing and validating robust control policies for dynamic waypoint tracking on such challenging surfaces. We leverage massively parallel simulation to train reinforcement learning agents across a vast distribution of procedurally generated environments with randomized physics. These policies are then transferred zero-shot to a physical wheeled rover operating in a lunar-analogue facility. Our experiments systematically compare multiple reinforcement learning algorithms and action smoothing filters to identify the most effective combinations for real-world deployment. Crucially, we provide strong empirical evidence that agents trained with procedural diversity achieve superior zero-shot performance compared to those trained on static scenarios. We also analyze the trade-offs of fine-tuning with high-fidelity particle physics, which offers minor gains in low-speed precision at a significant computational cost. Together, these contributions establish a validated workflow for creating reliable learning-based navigation systems, marking a critical step towards deploying autonomous robots in the final frontier.