HEX: Humanoid-Aligned Experts for Cross-Embodiment Whole-Body Manipulation

📅 2026-04-09

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Existing vision-language action models struggle to coordinate full-body motion in high-degree-of-freedom humanoid robots, often leading to instability due to independent control of individual body parts. This work proposes the HEX framework, which efficiently integrates visual-language instructions with proprioceptive dynamics through a humanoid-aligned universal state representation, a Mixture-of-Experts unified proprioception predictor, a lightweight historical token mechanism, and a residual gating fusion strategy. A flow-matching action head generates coherent whole-body motions grounded in this unified representation. The approach significantly enhances whole-body coordination, rapid responsiveness, and cross-platform generalization for humanoid robots, achieving state-of-the-art success rates on real-world manipulation tasks.

Technology Category

Application Category

📝 Abstract

Humans achieve complex manipulation through coordinated whole-body control, whereas most Vision-Language-Action (VLA) models treat robot body parts largely independently, making high-DoF humanoid control challenging and often unstable. We present HEX, a state-centric framework for coordinated manipulation on full-sized bipedal humanoid robots. HEX introduces a humanoid-aligned universal state representation for scalable learning across heterogeneous embodiments, and incorporates a Mixture-of-Experts Unified Proprioceptive Predictor to model whole-body coordination and temporal motion dynamics from large-scale multi-embodiment trajectory data. To efficiently capture temporal visual context, HEX uses lightweight history tokens to summarize past observations, avoiding repeated encoding of historical images during inference. It further employs a residual-gated fusion mechanism with a flow-matching action head to adaptively integrate visual-language cues with proprioceptive dynamics for action generation. Experiments on real-world humanoid manipulation tasks show that HEX achieves state-of-the-art performance in task success rate and generalization, particularly in fast-reaction and long-horizon scenarios.

Problem

Research questions and friction points this paper is trying to address.

whole-body manipulation

humanoid robots

cross-embodiment

coordinated control

high-DoF

Innovation

Methods, ideas, or system contributions that make the work stand out.

Humanoid-Aligned State Representation

Mixture-of-Experts

Whole-Body Coordination