SONIC: Supersizing Motion Tracking for Natural Humanoid Whole-Body Control

📅 2025-11-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing neural controllers for humanoid robots suffer from limited capacity, poor behavioral generalization, and insufficient training scale. Method: This paper proposes a general-purpose, full-body control foundation model grounded in motion tracking. It introduces a unified action token space and a real-time motion planner to support multimodal inputs—including VR, video, and vision-language modalities—and trains a 42-million-parameter network on over 100 million high-quality motion-capture frames, leveraging dense supervision and human motion priors across 9,000 GPU-hours. Contribution/Results: The model significantly improves motion naturalness and cross-task robustness, generalizing effectively to unseen behaviors. Empirical results demonstrate consistent performance gains with scaling of model size, dataset volume, and compute budget, thereby validating motion tracking as a scalable, high-performing paradigm for humanoid robot control.

Technology Category

Application Category

📝 Abstract
Despite the rise of billion-parameter foundation models trained across thousands of GPUs, similar scaling gains have not been shown for humanoid control. Current neural controllers for humanoids remain modest in size, target a limited behavior set, and are trained on a handful of GPUs over several days. We show that scaling up model capacity, data, and compute yields a generalist humanoid controller capable of creating natural and robust whole-body movements. Specifically, we posit motion tracking as a natural and scalable task for humanoid control, leverageing dense supervision from diverse motion-capture data to acquire human motion priors without manual reward engineering. We build a foundation model for motion tracking by scaling along three axes: network size (from 1.2M to 42M parameters), dataset volume (over 100M frames, 700 hours of high-quality motion data), and compute (9k GPU hours). Beyond demonstrating the benefits of scale, we show the practical utility of our model through two mechanisms: (1) a real-time universal kinematic planner that bridges motion tracking to downstream task execution, enabling natural and interactive control, and (2) a unified token space that supports various motion input interfaces, such as VR teleoperation devices, human videos, and vision-language-action (VLA) models, all using the same policy. Scaling motion tracking exhibits favorable properties: performance improves steadily with increased compute and data diversity, and learned representations generalize to unseen motions, establishing motion tracking at scale as a practical foundation for humanoid control.
Problem

Research questions and friction points this paper is trying to address.

Scaling up humanoid control models beyond current limited parameter sizes
Creating natural whole-body movements without manual reward engineering
Developing a universal controller for multiple motion input interfaces
Innovation

Methods, ideas, or system contributions that make the work stand out.

Scaled model capacity with 42M parameters
Trained on 100M frames of motion data
Real-time universal kinematic planner for tasks
🔎 Similar Papers
No similar papers found.
Z
Zhengyi Luo
Nvidia
Y
Ye Yuan
Nvidia
T
Tingwu Wang
Nvidia
Chenran Li
Chenran Li
PhD, University of California, Berkeley
reinforcement learningmotion planningsimulationautonomous drivingbehavior modeling
Sirui Chen
Sirui Chen
University of Illinois Urbana-Champaign
Reinforcement LearningInformation Retrieval
F
Fernando Castaneda
Nvidia
Zi-ang Cao
Zi-ang Cao
MS Student, Stanford University
RoboticsMachine LearningComputer Vision
Jiefeng Li
Jiefeng Li
Research Scientist, NVIDIA Research
Computer VisionMachine LearningDigital Humans
D
David Minor
Nvidia
Qingwei Ben
Qingwei Ben
The Chinese University of Hong Kong
Robot LearningEmbodied AIHumanoidsQingwei
X
Xingye Da
Nvidia
Runyu Ding
Runyu Ding
The University of Hong Kong
Computer VisionDeep Learning
C
Cyrus Hogg
Nvidia
L
Lina Song
Nvidia
E
Edy Lim
Nvidia
E
Eugene Jeong
Nvidia
Tairan He
Tairan He
Carnegie Mellon University
RoboticsReinforcement LearningRobot LearningImitation Learning
Haoru Xue
Haoru Xue
PhD in AI Robotics, UC Berkeley
robot learningVLAhumanoid
Wenli Xiao
Wenli Xiao
PhD in Robotics, Carnegie Mellon University
Robot LearningReinforcement LearningHumanoids
Z
Zi Wang
Nvidia
S
Simon Yuen
Nvidia
Jan Kautz
Jan Kautz
Vice President of Research, NVIDIA Research
Computer VisionMachine LearningVisual Computing
Yan Chang
Yan Chang
Ph.D, University of Michigan, NVIDIA
Autonomous VehiclesRoboticsMachine LearningAutomotive SystemsEnergy Systems
U
Umar Iqbal
Nvidia
L
Linxi Jim Fan
Nvidia
Yuke Zhu
Yuke Zhu
The University of Texas at Austin, NVIDIA Research
Robot LearningComputer VisionMachine LearningRoboticsArtificial Intelligence