HoloMotion-1 Technical Report

πŸ“… 2026-05-14
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

231K/year
πŸ€– AI Summary
This work addresses the limited behavioral diversity and poor generalization of conventional human motion tracking methods that rely heavily on motion capture data. To overcome these limitations, we propose the first humanoid motion foundation model trained on a large-scale heterogeneous motion corpus, integrating in-the-wild video reconstructions, motion capture datasets, and internally collected data. By incorporating a sparsely activated Mixture-of-Experts Transformer, KV-cache inference mechanism, and sequence-level training strategy, our approach enables efficient temporal modeling and real-time control. The model significantly outperforms existing methods across multiple unseen motion benchmarks, demonstrating strong generalization capabilities and enabling direct deployment on real humanoid robots without task-specific fine-tuning.
πŸ“ Abstract
In this report, we present HoloMotion-1, a humanoid motion foundation model for zero-shot whole-body motion tracking. A key innovation of HoloMotion-1 is to scale control-policy training with a large-scale hybrid motion corpus, where video-reconstructed motions from in-the-wild videos provide the dominant source of motion diversity, while curated motion-capture and in-house motion data provide higher-fidelity supervision and deployment-oriented coverage. This data regime enables HoloMotion-1 to move beyond conventional MoCap-only training and exposes the policy to substantially broader behaviors, capture conditions, and motion styles. Learning from such heterogeneous data introduces new challenges, including reconstruction noise, source-domain mismatch, uneven motion quality, and the need for temporal modeling under large behavioral variation. To address these challenges, HoloMotion-1 integrates large-capacity temporal modeling, a sparsely activated Mixture-of-Experts Transformer with KV-cache inference for real-time control, and a sequence-level training strategy that improves learning efficiency on extended motion sequences. Extensive experiments on multiple unseen motion benchmarks show that HoloMotion-1 generalizes robustly across diverse motion types and capture conditions, significantly improves tracking accuracy over prior methods, and transfers directly to a real humanoid robot without task-specific fine-tuning.
Problem

Research questions and friction points this paper is trying to address.

zero-shot motion tracking
humanoid motion
heterogeneous motion data
motion reconstruction
temporal modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

foundation model
zero-shot motion tracking
Mixture-of-Experts Transformer
hybrid motion corpus
real-time humanoid control
πŸ”Ž Similar Papers
No similar papers found.