๐ค AI Summary
This work addresses the challenges of poor motion stability, tight multi-task coupling, and difficult state transitions in bipedal soccer robots during dynamic interactions. To overcome these issues, the authors propose a modular reinforcement learning framework that combines an open-loop feedforward oscillator to generate basic locomotion patterns with a reinforcement learningโtrained feedback residual policy to handle complex soccer-specific maneuvers. A posture-driven finite state machine enables seamless, interference-free switching between ball-seeking/kicking and fall-recovery behaviors. The recovery policy is efficiently trained via a progressive force-decay curriculum learning strategy. Evaluated in Unity simulation, the system demonstrates strong spatial adaptability, reliably executing kicking tasks in confined spaces and achieving rapid autonomous fall recovery with an average time of 0.715 seconds.
๐ Abstract
Developing bipedal football robots in dynamiccombat environments presents challenges related to motionstability and deep coupling of multiple tasks, as well ascontrol switching issues between different states such as up-right walking and fall recovery. To address these problems,this paper proposes a modular reinforcement learning (RL)framework for achieving adaptive multi-task control. Firstly,this framework combines an open-loop feedforward oscilla-tor with a reinforcement learning-based feedback residualstrategy, effectively separating the generation of basic gaitsfrom complex football actions. Secondly, a posture-driven statemachine is introduced, clearly switching between the ballseeking and kicking network (BSKN) and the fall recoverynetwork (FRN), fundamentally preventing state interference.The FRN is efficiently trained through a progressive forceattenuation curriculum learning strategy. The architecture wasverified in Unity simulations of bipedal robots, demonstratingexcellent spatial adaptability-reliably finding and kicking theball even in restricted corner scenarios-and rapid autonomousfall recovery (with an average recovery time of 0.715 seconds).This ensures seamless and stable operation in complex multi-task environments.