Chasing Stability: Humanoid Running via Control Lyapunov Function Guided Reinforcement Learning

📅 2025-09-23

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

To address the challenges of nonlinear, hybrid dynamics and real-time control in high-speed dynamic bipedal running, this paper proposes the CLF-RL framework: it embeds Control Lyapunov Functions (CLFs) into the reward shaping mechanism of reinforcement learning (RL), enabling end-to-end policy learning with formal stability guarantees and eliminating hand-crafted reward terms. The method integrates CLF-based safety constraints, optimized dynamic reference trajectory generation, and onboard-sensor-driven closed-loop execution, unifying control across both flight and single-support phases. Experiments demonstrate stable running on treadmills and diverse outdoor terrains, strong robustness against external disturbances and terrain variations, and high-fidelity tracking of global reference commands. The key contribution is the first deep integration of CLF theory into the RL training pipeline—bridging formal stability certification with data-driven adaptability—while maintaining computational efficiency for real-time deployment.

Technology Category

Application Category

📝 Abstract

Achieving highly dynamic behaviors on humanoid robots, such as running, requires controllers that are both robust and precise, and hence difficult to design. Classical control methods offer valuable insight into how such systems can stabilize themselves, but synthesizing real-time controllers for nonlinear and hybrid dynamics remains challenging. Recently, reinforcement learning (RL) has gained popularity for locomotion control due to its ability to handle these complex dynamics. In this work, we embed ideas from nonlinear control theory, specifically control Lyapunov functions (CLFs), along with optimized dynamic reference trajectories into the reinforcement learning training process to shape the reward. This approach, CLF-RL, eliminates the need to handcraft and tune heuristic reward terms, while simultaneously encouraging certifiable stability and providing meaningful intermediate rewards to guide learning. By grounding policy learning in dynamically feasible trajectories, we expand the robot's dynamic capabilities and enable running that includes both flight and single support phases. The resulting policy operates reliably on a treadmill and in outdoor environments, demonstrating robustness to disturbances applied to the torso and feet. Moreover, it achieves accurate global reference tracking utilizing only on-board sensors, making a critical step toward integrating these dynamic motions into a full autonomy stack.

Problem

Research questions and friction points this paper is trying to address.

Designing robust controllers for humanoid robot running with dynamic stability

Addressing nonlinear hybrid dynamics in real-time locomotion control synthesis

Eliminating heuristic reward tuning while ensuring certifiable stability guarantees

Innovation

Methods, ideas, or system contributions that make the work stand out.

Control Lyapunov Function guided reinforcement learning

Optimized dynamic reference trajectories shaping reward

Policy learning grounded in dynamically feasible trajectories

🔎 Similar Papers

No similar papers found.