🤖 AI Summary
This work addresses the lack of open-source, low-latency, and unified multimodal support for VR-based teleoperation across heterogeneous robotic platforms. We propose an open-source, bimanual, multi-body VR teleoperation system. Methodologically, we design a zero-copy streaming architecture and an asynchronous “think-act” control loop to enable real-time dexterous control, synchronized multimodal data acquisition, and policy learning across diverse platforms—from 7-DOF manipulators to full-body humanoid robots. The system integrates consumer-grade VR hardware, modular robot interfaces, low-latency network APIs, and multimodal temporal alignment techniques, natively supporting the LeRobot data format and state-of-the-art vision–motor policies including ACT, DiffusionPolicy, and SmolVLA. End-to-end latency is ≤35 ms; high-fidelity demonstration datasets are successfully collected across multiple tasks, and cross-morphology policy transfer is empirically validated.
📝 Abstract
extbf{BEAVR} is an open-source, bimanual, multi-embodiment Virtual Reality (VR) teleoperation system for robots, designed to unify real-time control, data recording, and policy learning across heterogeneous robotic platforms. BEAVR enables real-time, dexterous teleoperation using commodity VR hardware, supports modular integration with robots ranging from 7-DoF manipulators to full-body humanoids, and records synchronized multi-modal demonstrations directly in the LeRobot dataset schema. Our system features a zero-copy streaming architecture achieving $leq$35,ms latency, an asynchronous ``think--act'' control loop for scalable inference, and a flexible network API optimized for real-time, multi-robot operation. We benchmark BEAVR across diverse manipulation tasks and demonstrate its compatibility with leading visuomotor policies such as ACT, DiffusionPolicy, and SmolVLA. All code is publicly available, and datasets are released on Hugging Facefootnote{Code, datasets, and VR app available at https://github.com/ARCLab-MIT/BEAVR-Bot.