NaviGait: Navigating Dynamically Feasible Gait Libraries using Deep Reinforcement Learning

📅 2025-10-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Trajectory optimization for bipedal robot locomotion lacks adaptability, while reinforcement learning (RL) approaches suffer from dependence on intricate reward engineering. Method: This paper proposes NaviGait, a hierarchical control framework that decouples high-level motion planning from low-level feedback control. Offline, a navigable gait library is constructed via hybrid zero-dynamics-based optimization, yielding structured and interpretable reference gaits. Online, lightweight residual policies perform joint-level real-time corrections. Contribution/Results: This design simplifies the reward function to only tracking error and stability terms, enhancing training stability and policy interpretability. Experiments demonstrate that NaviGait achieves faster convergence than conventional RL and imitation learning methods, produces motions closely aligned with reference trajectories, and exhibits significantly improved robustness against external disturbances.

Technology Category

Application Category

📝 Abstract
Reinforcement learning (RL) has emerged as a powerful method to learn robust control policies for bipedal locomotion. Yet, it can be difficult to tune desired robot behaviors due to unintuitive and complex reward design. In comparison, offline trajectory optimization methods, like Hybrid Zero Dynamics, offer more tuneable, interpretable, and mathematically grounded motion plans for high-dimensional legged systems. However, these methods often remain brittle to real-world disturbances like external perturbations. In this work, we present NaviGait, a hierarchical framework that combines the structure of trajectory optimization with the adaptability of RL for robust and intuitive locomotion control. NaviGait leverages a library of offline-optimized gaits and smoothly interpolates between them to produce continuous reference motions in response to high-level commands. The policy provides both joint-level and velocity command residual corrections to modulate and stabilize the reference trajectories in the gait library. One notable advantage of NaviGait is that it dramatically simplifies reward design by encoding rich motion priors from trajectory optimization, reducing the need for finely tuned shaping terms and enabling more stable and interpretable learning. Our experimental results demonstrate that NaviGait enables faster training compared to conventional and imitation-based RL, and produces motions that remain closest to the original reference. Overall, by decoupling high-level motion generation from low-level correction, NaviGait offers a more scalable and generalizable approach for achieving dynamic and robust locomotion.
Problem

Research questions and friction points this paper is trying to address.

Combining trajectory optimization with reinforcement learning for locomotion control
Simplifying reward design by encoding motion priors from trajectory optimization
Enabling robust bipedal locomotion against real-world disturbances and perturbations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combining trajectory optimization with reinforcement learning
Leveraging gait libraries for continuous reference motions
Providing joint and velocity residual corrections
🔎 Similar Papers
No similar papers found.