PGTT: Phase-Guided Terrain Traversal for Perceptive Legged Locomotion

📅 2025-10-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing perception-augmented reinforcement learning (RL) controllers for legged robots suffer from two key limitations: reliance on oscillatory or inverse-kinematics priors, which constrain the action space and induce policy bias; and “blind” operation, leading to poor terrain anticipation—especially behind the robot—and low noise robustness. This paper proposes PGTT, a perception-enhanced deep RL method that eliminates gait priors. PGTT jointly encodes per-leg phase with LiDAR-derived terrain heightmap statistics, incorporates phase-aware contact penalties and swing-height modulation, and represents foot trajectories via cubic Hermite splines, directly outputting joint-space actions. By discarding conventional gait priors, PGTT significantly improves generalization and robustness. Experiments show a 7.5% median increase in success rate under thrust disturbances, a 9% improvement in discrete obstacle traversal, approximately twofold acceleration in policy convergence, and successful real-world deployment on Unitree Go2 and ANYmal-C robots.

Technology Category

Application Category

📝 Abstract
State-of-the-art perceptive Reinforcement Learning controllers for legged robots either (i) impose oscillator or IK-based gait priors that constrain the action space, add bias to the policy optimization and reduce adaptability across robot morphologies, or (ii) operate "blind", which struggle to anticipate hind-leg terrain, and are brittle to noise. In this paper, we propose Phase-Guided Terrain Traversal (PGTT), a perception-aware deep-RL approach that overcomes these limitations by enforcing gait structure purely through reward shaping, thereby reducing inductive bias in policy learning compared to oscillator/IK-conditioned action priors. PGTT encodes per-leg phase as a cubic Hermite spline that adapts swing height to local heightmap statistics and adds a swing- phase contact penalty, while the policy acts directly in joint space supporting morphology-agnostic deployment. Trained in MuJoCo (MJX) on procedurally generated stair-like terrains with curriculum and domain randomization, PGTT achieves the highest success under push disturbances (median +7.5% vs. the next best method) and on discrete obstacles (+9%), with comparable velocity tracking, and converging to an effective policy roughly 2x faster than strong end-to-end baselines. We validate PGTT on a Unitree Go2 using a real-time LiDAR elevation-to-heightmap pipeline, and we report preliminary results on ANYmal-C obtained with the same hyperparameters. These findings indicate that terrain-adaptive, phase-guided reward shaping is a simple and general mechanism for robust perceptive locomotion across platforms.
Problem

Research questions and friction points this paper is trying to address.

Overcoming constraints of oscillator-based gait priors in legged robot locomotion
Addressing blind controllers' inability to anticipate hind-leg terrain obstacles
Reducing inductive bias in policy learning for morphology-agnostic deployment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Enforces gait structure through reward shaping
Encodes leg phase as adaptive cubic Hermite spline
Acts directly in joint space for morphology-agnostic deployment
A
Alexandros Ntagkas
Laboratory of Automation and Robotics (LAR) in the Department of Electrical & Computer Engineering, University of Patras, GR-26504 Patras, Greece
C
Chairi Kiourt
Athena - Research and Innovation Center in Information, Communication and Knowledge Technologies, Xanthi, Greece
Konstantinos Chatzilygeroudis
Konstantinos Chatzilygeroudis
Assistant Professor at University of Patras
Robot LearningEvolutionary RoboticsReinforcement LearningRoboticsEvolutionary Computation