🤖 AI Summary
This work addresses the significant performance degradation and slow recovery of robotic systems in real-world environments caused by unobserved dynamic shifts—such as actuator degradation or abrupt contact changes. To enable rapid adaptation during inference without retraining or prior knowledge of disturbances, the authors propose a stability-aligned residual control architecture that operates with a frozen policy and introduces a bounded additive residual channel. The core innovation is the Stability-Aligned Gating (SAG) mechanism, which enforces magnitude constraints, directional consistency, performance-conditioned activation, and adaptive gain modulation. This approach preserves the nominal controller’s structure while ensuring closed-loop stability and accelerating compensation. Evaluated on Go1, Cassie, H1, and Scout platforms, the method reduces average recovery time by 87%, 48%, 30%, and 20% respectively compared to a frozen SAC policy, achieving near-nominal steady-state performance.
📝 Abstract
Robotic systems operating in real-world environments inevitably encounter unobserved dynamics shifts during continuous execution, including changes in actuation, mass distribution, or contact conditions. When such shifts occur mid-episode, even locally stabilizing learned policies can experience substantial transient performance degradation. While input-to-state stability guarantees bounded state deviation, it does not ensure rapid restoration of task-level performance. We address inference-time recovery under frozen policy parameters by casting adaptation as constrained disturbance shaping around a nominal stabilizing controller. We propose a stability-aligned residual control architecture in which a reinforcement learning policy trained under nominal dynamics remains fixed at deployment, and adaptation occurs exclusively through a bounded additive residual channel. A Stability Alignment Gate (SAG) regulates corrective authority through magnitude constraints, directional coherence with the nominal action, performance-conditioned activation, and adaptive gain modulation. These mechanisms preserve the nominal closed-loop structure while enabling rapid compensation for unobserved dynamics shifts without retraining or privileged disturbance information. Across mid-episode perturbations including actuator degradation, mass variation, and contact changes, the proposed method consistently reduces recovery time relative to frozen and online-adaptation baselines while maintaining near-nominal steady-state performance. Recovery time is reduced by \textbf{87\%} on the Go1 quadruped, \textbf{48\%} on the Cassie biped, \textbf{30\%} on the H1 humanoid, and \textbf{20\%} on the Scout wheeled platform on average across evaluated conditions relative to a frozen SAC policy.