OmniClone: Engineering a Robust, All-Rounder Whole-Body Humanoid Teleoperation System

📅 2026-03-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing full-body humanoid teleoperation systems, which often suffer from high coupling, prohibitive deployment costs, and insufficient fine-grained diagnostic capabilities, making it difficult to balance robustness, generality, and practicality. The authors propose a lightweight, high-fidelity teleoperation framework compatible with multiple skill modalities, capable of real-time execution on a single consumer-grade GPU and generalizing across operators of diverse body morphologies. Key innovations include the OmniBench diagnostic benchmark, identity-agnostic motion retargeting, a low-latency communication architecture, and a compact policy network. The system reduces MPJPE error by over 66% on unseen motions and decreases computational overhead by several orders of magnitude compared to current methods, substantially enhancing both practical utility and scalability.

Technology Category

Application Category

📝 Abstract
Whole-body humanoid teleoperation enables humans to remotely control humanoid robots, serving as both a real-time operational tool and a scalable engine for collecting demonstrations for autonomous learning. Despite recent advances, existing systems are validated using aggregate metrics that conflate distinct motion regimes, masking critical failure modes. This lack of diagnostic granularity, compounded by tightly coupled and labor-intensive system configurations, hinders robust real-world deployment. A key open challenge is building a teleoperation system that is simultaneously robust, versatile, and affordable for practical use. Here we present OmniClone, a whole-body humanoid teleoperation system that achieves high-fidelity, multi-skill control on a single consumer GPU with modest data requirements. Central to our approach is OmniBench, a diagnostic benchmark that evaluates policies across stratified motion categories and difficulty levels on unseen motions, exposing the narrow specialization of prior systems. Guided by these diagnostics, we identify an optimized training data recipe and integrate system-level improvements: subject-agnostic retargeting and robust communication, that collectively reduce Mean Per-Joint Position Error (MPJPE) by over 66% while requiring orders-of-magnitude fewer computational resources than comparable methods. Crucially, OmniClone is control-source-agnostic: a single unified policy supports real-time teleoperation, generated motion playback, and Vision-Language-Action (VLA) models, while generalizing across operators of vastly different body proportions. By uniting diagnostic evaluation with practical engineering, OmniClone provides an accessible foundation for scalable humanoid teleoperation and autonomous learning.
Problem

Research questions and friction points this paper is trying to address.

humanoid teleoperation
robustness
versatility
diagnostic benchmark
real-world deployment
Innovation

Methods, ideas, or system contributions that make the work stand out.

whole-body teleoperation
diagnostic benchmark
subject-agnostic retargeting
unified policy
humanoid robotics
🔎 Similar Papers
No similar papers found.
Y
Yixuan Li
School of Computer Science and Technology, Beijing Institute of Technology; Beijing Institute for General Artificial Intelligence (BIGAI); State Key Lab of General AI
L
Le Ma
Beijing Institute for General Artificial Intelligence (BIGAI); State Key Lab of General AI
Y
Yutang Lin
Institute for AI, Peking University; Beijing Institute for General Artificial Intelligence (BIGAI); School of Psychological and Cognitive Sciences, Peking University; State Key Lab of General AI; Beijing Key Laboratory of Behavior and Mental Health, Peking University; Embodied Intelligence Lab, PKU-Wuhan Institute for Artificial Intelligence
Y
Yushi Du
Department of Electrical and Electronic Engineering, The University of Hong Kong; Beijing Institute for General Artificial Intelligence (BIGAI); State Key Lab of General AI
M
Mengya Liu
Beijing Institute for General Artificial Intelligence (BIGAI); State Key Lab of General AI
Kaizhe Hu
Kaizhe Hu
Tsinghua University
Reinforcement LearningRobotics
Jieming Cui
Jieming Cui
Peking University
Yixin Zhu
Yixin Zhu
Assistant Professor, Peking University
Computer VisionVisual ReasoningHuman-Robot Teaming
W
Wei Liang
School of Computer Science and Technology, Beijing Institute of Technology; Yangtze Delta Region Academy of Beijing Institute of Technology
Baoxiong Jia
Baoxiong Jia
Ph.D. in Computer Science, UCLA
Computer VisionArtificial Intelligence
Siyuan Huang
Siyuan Huang
Beijing Institute for General Artificial Intelligence (BIGAI)
Embodied AI3D VisionRobotics3D Scene Understanding