🤖 AI Summary
Existing perception-augmented reinforcement learning (RL) controllers for legged robots suffer from two key limitations: reliance on oscillatory or inverse-kinematics priors, which constrain the action space and induce policy bias; and “blind” operation, leading to poor terrain anticipation—especially behind the robot—and low noise robustness. This paper proposes PGTT, a perception-enhanced deep RL method that eliminates gait priors. PGTT jointly encodes per-leg phase with LiDAR-derived terrain heightmap statistics, incorporates phase-aware contact penalties and swing-height modulation, and represents foot trajectories via cubic Hermite splines, directly outputting joint-space actions. By discarding conventional gait priors, PGTT significantly improves generalization and robustness. Experiments show a 7.5% median increase in success rate under thrust disturbances, a 9% improvement in discrete obstacle traversal, approximately twofold acceleration in policy convergence, and successful real-world deployment on Unitree Go2 and ANYmal-C robots.
📝 Abstract
State-of-the-art perceptive Reinforcement Learning controllers for legged robots either (i) impose oscillator or IK-based gait priors that constrain the action space, add bias to the policy optimization and reduce adaptability across robot morphologies, or (ii) operate "blind", which struggle to anticipate hind-leg terrain, and are brittle to noise. In this paper, we propose Phase-Guided Terrain Traversal (PGTT), a perception-aware deep-RL approach that overcomes these limitations by enforcing gait structure purely through reward shaping, thereby reducing inductive bias in policy learning compared to oscillator/IK-conditioned action priors. PGTT encodes per-leg phase as a cubic Hermite spline that adapts swing height to local heightmap statistics and adds a swing- phase contact penalty, while the policy acts directly in joint space supporting morphology-agnostic deployment. Trained in MuJoCo (MJX) on procedurally generated stair-like terrains with curriculum and domain randomization, PGTT achieves the highest success under push disturbances (median +7.5% vs. the next best method) and on discrete obstacles (+9%), with comparable velocity tracking, and converging to an effective policy roughly 2x faster than strong end-to-end baselines. We validate PGTT on a Unitree Go2 using a real-time LiDAR elevation-to-heightmap pipeline, and we report preliminary results on ANYmal-C obtained with the same hyperparameters. These findings indicate that terrain-adaptive, phase-guided reward shaping is a simple and general mechanism for robust perceptive locomotion across platforms.