ContractionPPO: Certified Reinforcement Learning via Differentiable Contraction Layers

📅 2026-03-20

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This work addresses the challenge of designing control strategies for legged robots in unstructured environments that simultaneously achieve high performance and formal robustness guarantees. The authors propose embedding a differentiable, state-dependent contraction metric into a Proximal Policy Optimization (PPO) reinforcement learning framework to jointly optimize policy performance and incremental exponential stability of the closed-loop system. Innovatively, a trainable Lipschitz neural network parameterizes the contraction metric as an auxiliary head trained in parallel with the policy, enabling end-to-end learning while incorporating a worst-case upper bound on the contraction rate for formal stability verification. Hardware experiments on a quadruped robot demonstrate that the proposed method achieves stable locomotion under strong disturbances, offering both high robustness and certifiable stability, along with strong sim-to-real transferability.

Technology Category

Application Category

📝 Abstract

Legged locomotion in unstructured environments demands not only high-performance control policies but also formal guarantees to ensure robustness under perturbations. Control methods often require carefully designed reference trajectories, which are challenging to construct in high-dimensional, contact-rich systems such as quadruped robots. In contrast, Reinforcement Learning (RL) directly learns policies that implicitly generate motion, and uniquely benefits from access to privileged information, such as full state and dynamics during training, that is not available at deployment. We present ContractionPPO, a framework for certified robust planning and control of legged robots by augmenting Proximal Policy Optimization (PPO) RL with a state-dependent contraction metric layer. This approach enables the policy to maximize performance while simultaneously producing a contraction metric that certifies incremental exponential stability of the simulated closed-loop system. The metric is parameterized as a Lipschitz neural network and trained jointly with the policy, either in parallel or as an auxiliary head of the PPO backbone. While the contraction metric is not deployed during real-world execution, we derive upper bounds on the worst-case contraction rate and show that these bounds ensure the learned contraction metric generalizes from simulation to real-world deployment. Our hardware experiments on quadruped locomotion demonstrate that ContractionPPO enables robust, certifiably stable control even under strong external perturbations.

Problem

Research questions and friction points this paper is trying to address.

legged locomotion

robustness

certified stability

reinforcement learning

contraction theory

Innovation

Methods, ideas, or system contributions that make the work stand out.

ContractionPPO

certified reinforcement learning

contraction metric