X-Loco: Towards Generalist Humanoid Locomotion Control via Synergetic Policy Distillation

📅 2026-03-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Humanoid robots struggle to achieve diverse and often conflicting motor skills—such as upright walking, fall recovery, and whole-body coordination—using a single control policy. To address this challenge, this work proposes a multi-expert collaborative training framework that integrates a context-aware expert selection mechanism with collaborative policy distillation. This approach enables a student policy, relying solely on visual inputs and velocity commands, to learn general-purpose locomotion control in an end-to-end manner. Notably, the method achieves the first vision-driven, unified controller that operates without reference motion priors, demonstrating strong performance in tasks like fall recovery and traversal of complex terrains. Ablation studies further confirm its significant advantages in knowledge transfer efficiency and multi-skill integration.

Technology Category

Application Category

📝 Abstract
While recent advances have demonstrated strong performance in individual humanoid skills such as upright locomotion, fall recovery and whole-body coordination, learning a single policy that masters all these skills remains challenging due to the diverse dynamics and conflicting control objectives involved. To address this, we introduce X-Loco, a framework for training a vision-based generalist humanoid locomotion policy. X-Loco trains multiple oracle specialist policies and adopts a synergetic policy distillation with a case-adaptive specialist selection mechanism, which dynamically leverages multiple specialist policies to guide a vision-based student policy. This design enables the student to acquire a broad spectrum of locomotion skills, ranging from fall recovery to terrain traversal and whole-body coordination skills. To the best of our knowledge, X-Loco is the first framework to demonstrate vision-based humanoid locomotion that jointly integrates upright locomotion, whole-body coordination and fall recovery, while operating solely under velocity commands without relying on reference motions. Experimental results show that X-Loco achieves superior performance, demonstrated by tasks such as fall recovery and terrain traversal. Ablation studies further highlight that our framework effectively leverages specialist expertise and enhances learning efficiency.
Problem

Research questions and friction points this paper is trying to address.

humanoid locomotion
generalist policy
fall recovery
whole-body coordination
policy distillation
Innovation

Methods, ideas, or system contributions that make the work stand out.

synergetic policy distillation
generalist humanoid locomotion
vision-based control
specialist policy selection
fall recovery
🔎 Similar Papers
No similar papers found.
Dewei Wang
Dewei Wang
USTC
Robotics
X
Xinmiao Wang
Institute of Artificial Intelligence (TeleAI), China Telecom
C
Chenyun Zhang
Institute of Artificial Intelligence (TeleAI), China Telecom
Jiyuan Shi
Jiyuan Shi
Tsinghua University
Reinforcement LearningRobotics
Y
Yingnan Zhao
Harbin Engineering University
Chenjia Bai
Chenjia Bai
Institute of Artificial Intelligence, China Telecom(中国电信人工智能研究院, TeleAI)
Reinforcement LearningRoboticsEmbodied AI
X
Xuelong Li
Institute of Artificial Intelligence (TeleAI), China Telecom