🤖 AI Summary
Existing teleoperation systems for humanoid robots suffer from high latency due to their reliance on full-body motion retargeting and position control, hindering performance in dynamic interaction tasks. This work proposes a low-latency whole-body control framework that eschews conventional retargeting by directly mapping SE(3) poses of end-effectors into Cartesian space and incorporates a velocity feedforward mechanism to enhance responsiveness. The framework unifies support for diverse motion capture systems—including optical and VR-based setups—and achieves an end-to-end latency as low as 50 ms, substantially outperforming the current state-of-the-art latency of approximately 200 ms. Experimental validation demonstrates successful execution of highly dynamic tasks such as balancing a ping-pong ball, juggling, and real-time ball return, underscoring the system’s capability for agile human-robot interaction.
📝 Abstract
Building a low-latency humanoid teleoperation system is essential for collecting diverse reactive and dynamic demonstrations. However, existing approaches rely on heavily pre-processed human-to-humanoid motion retargeting and position-only PD control, resulting in substantial latency that severely limits responsiveness and prevents tasks requiring rapid feedback and fast reactions. To address this problem, we propose ExtremControl, a low latency whole-body control framework that: (1) operates directly on SE(3) poses of selected rigid links, primarily humanoid extremities, to avoid full-body retargeting; (2) utilizes a Cartesian-space mapping to directly convert human motion to humanoid link targets; and (3) incorporates velocity feedforward control at low level to support highly responsive behavior under rapidly changing control interfaces. We further provide a unified theoretical formulation of ExtremControl and systematically validate its effectiveness through experiments in both simulation and real-world environments. Building on ExtremControl, we implement a low-latency humanoid teleoperation system that supports both optical motion capture and VR-based motion tracking, achieving end-to-end latency as low as 50ms and enabling highly responsive behaviors such as ping-pong ball balancing, juggling, and real-time return, thereby substantially surpassing the 200ms latency limit observed in prior work.