🤖 AI Summary
Humanoid robots face significant challenges in achieving real-time, generalized autonomous recovery and standing after falling in dynamic environments. This paper proposes the first end-to-end deep reinforcement learning (DRL) framework that unifies fall recovery and standing behaviors. To accelerate training, we introduce the Cross-Q algorithm—a novel Q-learning variant leveraging cross-task knowledge transfer—and integrate a simulation-to-real transfer strategy to enhance deployment robustness. The method is seamlessly implemented on the Sigmaban hardware platform and demonstrates strong adaptability to external disturbances. It overcomes key limitations of conventional model predictive control (MPC) and keyframe-based (KFB) approaches in both real-time performance (response latency < 800 ms) and generalization capability. Experimental evaluation shows a statistically significant improvement in recovery success rate over the KFB method employed by Rhoban—the RoboCup 2023 KidSize World Champion—validating the effectiveness and engineering feasibility of unified policy modeling for post-fall locomotion.
📝 Abstract
Humanoid robotics faces significant challenges in achieving stable locomotion and recovering from falls in dynamic environments. Traditional methods, such as Model Predictive Control (MPC) and Key Frame Based (KFB) routines, either require extensive fine-tuning or lack real-time adaptability. This paper introduces FRASA, a Deep Reinforcement Learning (DRL) agent that integrates fall recovery and stand up strategies into a unified framework. Leveraging the Cross-Q algorithm, FRASA significantly reduces training time and offers a versatile recovery strategy that adapts to unpredictable disturbances. Comparative tests on Sigmaban humanoid robots demonstrate FRASA superior performance against the KFB method deployed in the RoboCup 2023 by the Rhoban Team, world champion of the KidSize League.