🤖 AI Summary
To address insufficient locomotion robustness of humanoid robots in unstructured environments, this paper proposes a decoupled multi-timescale hierarchical control architecture: a high-frequency proprioceptive stabilizer (blind pre-trained) operates at the lower level, while a lightweight perception encoder enables low-frequency semantic decision-making at the upper level. A two-stage curriculum learning paradigm—“stabilizer pre-training + perception fine-tuning”—enhances generalization under minimal perceptual input. The method is validated in MuJoCo simulation and on the Unitree G1 physical platform, significantly outperforming end-to-end and single-stage baselines. It achieves stable walking on challenging terrains—including stairs and narrow beams—overcoming key bottlenecks in dynamic balance and perception-action coordination for unstructured scenarios. Core contributions include: (1) a temporally decoupled control design, (2) a novel two-stage training framework, and (3) rigorous closed-loop robustness validation on real hardware.
📝 Abstract
Robust humanoid locomotion in unstructured environments requires architectures that balance fast low-level stabilization with slower perceptual decision-making. We show that a simple layered control architecture (LCA), a proprioceptive stabilizer running at high rate, coupled with a compact low-rate perceptual policy, enables substantially more robust performance than monolithic end-to-end designs, even when using minimal perception encoders. Through a two-stage training curriculum (blind stabilizer pretraining followed by perceptual fine-tuning), we demonstrate that layered policies consistently outperform one-stage alternatives in both simulation and hardware. On a Unitree G1 humanoid, our approach succeeds across stair and ledge tasks where one-stage perceptual policies fail. These results highlight that architectural separation of timescales, rather than network scale or complexity, is the key enabler for robust perception-conditioned locomotion.