🤖 AI Summary
This work proposes a cerebellum-inspired residual control framework to address the challenge of online adaptation in robotic policies following deployment under actuator failures, dynamic changes, or environmental perturbations. The approach enhances a frozen reinforcement learning policy during inference by applying online action corrections without altering the original policy parameters, thereby enabling fault recovery. Key innovations include high-dimensional pattern separation, parallel microzone residual pathways, multi-timescale local error-driven plasticity, and a conservative meta-adaptation mechanism that balances recovery speed with behavioral stability. Evaluated on MuJoCo benchmarks, the method improves performance by 66% and 53% for HalfCheetah-v5 and Humanoid-v5, respectively, under moderate faults, while exhibiting graceful degradation under severe disturbances. Further robustness is achieved through residual architecture consolidation.
📝 Abstract
Robotic policies deployed in real-world environments often encounter post-training faults, where retraining, exploration, or system identification are impractical. We introduce an inference-time, cerebellar-inspired residual control framework that augments a frozen reinforcement learning policy with online corrective actions, enabling fault recovery without modifying base policy parameters. The framework instantiates core cerebellar principles, including high-dimensional pattern separation via fixed feature expansion, parallel microzone-style residual pathways, and local error-driven plasticity with excitatory and inhibitory eligibility traces operating at distinct time scales. These mechanisms enable fast, localized correction under post-training disturbances while avoiding destabilizing global policy updates. A conservative, performance-driven meta-adaptation regulates residual authority and plasticity, preserving nominal behavior and suppressing unnecessary intervention. Experiments on MuJoCo benchmarks under actuator, dynamic, and environmental perturbations show improvements of up to $+66\%$ on \texttt{HalfCheetah-v5} and $+53\%$ on \texttt{Humanoid-v5} under moderate faults, with graceful degradation under severe shifts and complementary robustness from consolidating persistent residual corrections into policy parameters.