🤖 AI Summary
This work addresses the limitations of existing full-body humanoid teleoperation systems, which often suffer from high coupling, prohibitive deployment costs, and insufficient fine-grained diagnostic capabilities, making it difficult to balance robustness, generality, and practicality. The authors propose a lightweight, high-fidelity teleoperation framework compatible with multiple skill modalities, capable of real-time execution on a single consumer-grade GPU and generalizing across operators of diverse body morphologies. Key innovations include the OmniBench diagnostic benchmark, identity-agnostic motion retargeting, a low-latency communication architecture, and a compact policy network. The system reduces MPJPE error by over 66% on unseen motions and decreases computational overhead by several orders of magnitude compared to current methods, substantially enhancing both practical utility and scalability.
📝 Abstract
Whole-body humanoid teleoperation enables humans to remotely control humanoid robots, serving as both a real-time operational tool and a scalable engine for collecting demonstrations for autonomous learning. Despite recent advances, existing systems are validated using aggregate metrics that conflate distinct motion regimes, masking critical failure modes. This lack of diagnostic granularity, compounded by tightly coupled and labor-intensive system configurations, hinders robust real-world deployment. A key open challenge is building a teleoperation system that is simultaneously robust, versatile, and affordable for practical use. Here we present OmniClone, a whole-body humanoid teleoperation system that achieves high-fidelity, multi-skill control on a single consumer GPU with modest data requirements. Central to our approach is OmniBench, a diagnostic benchmark that evaluates policies across stratified motion categories and difficulty levels on unseen motions, exposing the narrow specialization of prior systems. Guided by these diagnostics, we identify an optimized training data recipe and integrate system-level improvements: subject-agnostic retargeting and robust communication, that collectively reduce Mean Per-Joint Position Error (MPJPE) by over 66% while requiring orders-of-magnitude fewer computational resources than comparable methods. Crucially, OmniClone is control-source-agnostic: a single unified policy supports real-time teleoperation, generated motion playback, and Vision-Language-Action (VLA) models, while generalizing across operators of vastly different body proportions. By uniting diagnostic evaluation with practical engineering, OmniClone provides an accessible foundation for scalable humanoid teleoperation and autonomous learning.