π€ AI Summary
This work addresses the limited generative adaptability of general-purpose humanoid robots in precisely tracking commands while handling strong disturbances, a shortcoming that often leads existing methods to produce unnatural or brittle failures. To overcome this, we propose Heraclesβa state-conditioned diffusion middleware that dynamically balances high-fidelity motion tracking and generative recovery behaviors between high-level reference motions and low-level physical controllers. Heracles uniquely employs a diffusion model as an implicit adaptive intermediate layer, enabling seamless transitions between tracking and generation without explicit mode switching. By integrating state-conditioned diffusion modeling, physics-based simulation control, and zero-shot trajectory synthesis, Heracles significantly enhances robustness under extreme perturbations, yielding natural, human-like recovery motions and advancing humanoid control from rigid tracking toward an open-ended generative paradigm.
π Abstract
Achieving general-purpose humanoid control requires a delicate balance between the precise execution of commanded motions and the flexible, anthropomorphic adaptability needed to recover from unpredictable environmental perturbations. Current general controllers predominantly formulate motion control as a rigid reference-tracking problem. While effective in nominal conditions, these trackers often exhibit brittle, non-anthropomorphic failure modes under severe disturbances, lacking the generative adaptability inherent to human motor control. To overcome this limitation, we propose Heracles, a novel state-conditioned diffusion middleware that bridges precise motion tracking and generative synthesis. Rather than relying on rigid tracking paradigms or complex explicit mode-switching, Heracles operates as an intermediary layer between high-level reference motions and low-level physics trackers. By conditioning on the robot's real-time state, the diffusion model implicitly adapts its behavior: it approximates an identity map when the state closely aligns with the reference, preserving zero-shot tracking fidelity. Conversely, when encountering significant state deviations, it seamlessly transitions into a generative synthesizer to produce natural, anthropomorphic recovery trajectories. Our framework demonstrates that integrating generative priors into the control loop not only significantly enhances robustness against extreme perturbations but also elevates humanoid control from a rigid tracking paradigm to an open-ended, generative general-purpose architecture.