Robo-DM: Data Management For Large Robot Datasets

📅 2025-05-21

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

To address the challenges of large-scale, slow-loading, and hard-to-generalize multimodal robotic trajectory data (video, text, numerical), this paper proposes a cloud-native trajectory data management framework. We design EBML—a self-contained binary format supporting hybrid lossy/lossless compression—achieving up to 70× compression over RLDS without sacrificing downstream task accuracy. We introduce a novel memory-mapped decoding cache coupled with load-balanced, multi-stream parallel video decoding, accelerating decoding by 50× versus LeRobot. The full system maintains model performance even under 75× aggressive compression. This work establishes an efficient, scalable data infrastructure for training large-scale Transformer models across diverse robots and tasks.

Technology Category

Application Category

📝 Abstract

Recent results suggest that very large datasets of teleoperated robot demonstrations can be used to train transformer-based models that have the potential to generalize to new scenes, robots, and tasks. However, curating, distributing, and loading large datasets of robot trajectories, which typically consist of video, textual, and numerical modalities - including streams from multiple cameras - remains challenging. We propose Robo-DM, an efficient open-source cloud-based data management toolkit for collecting, sharing, and learning with robot data. With Robo-DM, robot datasets are stored in a self-contained format with Extensible Binary Meta Language (EBML). Robo-DM can significantly reduce the size of robot trajectory data, transfer costs, and data load time during training. Compared to the RLDS format used in OXE datasets, Robo-DM's compression saves space by up to 70x (lossy) and 3.5x (lossless). Robo-DM also accelerates data retrieval by load-balancing video decoding with memory-mapped decoding caches. Compared to LeRobot, a framework that also uses lossy video compression, Robo-DM is up to 50x faster when decoding sequentially. We physically evaluate a model trained by Robo-DM with lossy compression, a pick-and-place task, and In-Context Robot Transformer. Robo-DM uses 75x compression of the original dataset and does not suffer reduction in downstream task accuracy.

Problem

Research questions and friction points this paper is trying to address.

Managing large multimodal robot datasets efficiently

Reducing data storage and transfer costs significantly

Accelerating data retrieval and training load times

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cloud-based toolkit for robot data management

EBML format for efficient data compression

Memory-mapped decoding caches for faster retrieval

🔎 Similar Papers

DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset