🤖 AI Summary
This work addresses the challenges of balance maintenance and constrained manipulation space in dual-arm manipulation for humanoid robots under external forces. To this end, the authors propose a force-adaptive reinforcement learning framework that encodes upper-limb configurations and interaction forces into a latent context. By integrating dynamics-based online disturbance estimation—without requiring wrist-mounted force/torque sensors—with spherical sampling of 3D perturbations and curriculum training on upper-body postures, the approach achieves, for the first time, an online adaptive standing policy for full-scale humanoid robots. In simulation, the method improves average standing success rate to 73.84%, substantially outperforming baseline approaches. Real-world experiments on the Unitree H1 robot demonstrate robust manipulation capabilities under both symmetric and asymmetric loading conditions, effectively expanding the dual-arm manipulation envelope.
📝 Abstract
Maintaining balance under external hand forces is critical for humanoid bimanual manipulation, where interaction forces propagate through the kinematic chain and constrain the feasible manipulation envelope. We propose \textbf{FAME}, a force-adaptive reinforcement learning framework that conditions a standing policy on a learned latent context encoding upper-body joint configuration and bimanual interaction forces. During training, we apply diverse, spherically sampled 3D forces on each hand to inject disturbances in simulation together with an upper-body pose curriculum, exposing the policy to manipulation-induced perturbations across continuously varying arm configurations. At deployment, interaction forces are estimated from the robot dynamics and fed to the same encoder, enabling online adaptation without wrist force/torque sensors. In simulation across five fixed arm configurations with randomized hand forces and commanded base heights, FAME improves mean standing success to 73.84%, compared to 51.40% for the curriculum-only baseline and 29.44% for the base policy. We further deploy the learned policy on a full-scale Unitree H12 humanoid and evaluate robustness in representative load-interaction scenarios, including asymmetric single-arm load and symmetric bimanual load. Code and videos are available on https://fame10.github.io/Fame/