CLF-RL: Control Lyapunov Function Guided Reinforcement Learning

📅 2025-08-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of laborious reward function design and insufficient policy robustness in reinforcement learning (RL) control for bipedal robots, this paper proposes a control Lyapunov function (CLF)-based reward shaping framework. Methodologically, it integrates a linear inverted pendulum model with a hybrid zero dynamics gait library to generate structured reference trajectories; CLFs are then employed to construct intermediate rewards that explicitly penalize state tracking errors while guaranteeing asymptotic convergence. The framework provides structured guidance during training and incurs zero additional computational overhead at deployment. Experiments demonstrate significantly improved RL training efficiency and superior disturbance rejection and stability over baseline RL policies in both simulation and real-world tests on the Unitree G1 robot—outperforming conventional trajectory-tracking reward formulations. The core contribution lies in the principled integration of CLF theory with RL reward design, enabling data-driven control policies with provable stability guarantees.

Technology Category

Application Category

📝 Abstract
Reinforcement learning (RL) has shown promise in generating robust locomotion policies for bipedal robots, but often suffers from tedious reward design and sensitivity to poorly shaped objectives. In this work, we propose a structured reward shaping framework that leverages model-based trajectory generation and control Lyapunov functions (CLFs) to guide policy learning. We explore two model-based planners for generating reference trajectories: a reduced-order linear inverted pendulum (LIP) model for velocity-conditioned motion planning, and a precomputed gait library based on hybrid zero dynamics (HZD) using full-order dynamics. These planners define desired end-effector and joint trajectories, which are used to construct CLF-based rewards that penalize tracking error and encourage rapid convergence. This formulation provides meaningful intermediate rewards, and is straightforward to implement once a reference is available. Both the reference trajectories and CLF shaping are used only during training, resulting in a lightweight policy at deployment. We validate our method both in simulation and through extensive real-world experiments on a Unitree G1 robot. CLF-RL demonstrates significantly improved robustness relative to the baseline RL policy and better performance than a classic tracking reward RL formulation.
Problem

Research questions and friction points this paper is trying to address.

Improves bipedal robot locomotion robustness via structured rewards
Guides RL policy using model-based trajectories and Lyapunov functions
Reduces reward design complexity while enhancing tracking convergence
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages CLFs for structured reward shaping
Uses LIP and HZD for trajectory generation
Lightweight policy deployment post-training
🔎 Similar Papers
No similar papers found.