FastUMI-100K: Advancing Data-driven Robotic Manipulation with a Large-scale UMI-style Dataset

📅 2025-10-09

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

Existing teleoperation datasets suffer from poor scalability, unsmooth trajectories, and weak cross-platform generalization, limiting their applicability to complex real-world manipulation tasks. To address these limitations, we propose FastUMI—a modular, hardware-agnostic lightweight teleoperation framework integrating multi-view fisheye vision, end-effector state sensing, and natural-language annotations, coupled with efficient pose tracking for high-fidelity multimodal trajectory acquisition. Leveraging FastUMI, we construct UMI-Home: a large-scale robot manipulation dataset comprising over 100,000 household-scene demonstrations in the UMI style. UMI-Home significantly improves data scale, trajectory smoothness, and adaptability across diverse robotic platforms. Evaluated on multiple imitation learning and reinforcement learning baselines, UMI-Home achieves consistently higher success rates—demonstrating its robust modeling capability for dynamic, long-horizon manipulation tasks and practical deployment value.

Technology Category

Application Category

📝 Abstract

Data-driven robotic manipulation learning depends on large-scale, high-quality expert demonstration datasets. However, existing datasets, which primarily rely on human teleoperated robot collection, are limited in terms of scalability, trajectory smoothness, and applicability across different robotic embodiments in real-world environments. In this paper, we present FastUMI-100K, a large-scale UMI-style multimodal demonstration dataset, designed to overcome these limitations and meet the growing complexity of real-world manipulation tasks. Collected by FastUMI, a novel robotic system featuring a modular, hardware-decoupled mechanical design and an integrated lightweight tracking system, FastUMI-100K offers a more scalable, flexible, and adaptable solution to fulfill the diverse requirements of real-world robot demonstration data. Specifically, FastUMI-100K contains over 100K+ demonstration trajectories collected across representative household environments, covering 54 tasks and hundreds of object types. Our dataset integrates multimodal streams, including end-effector states, multi-view wrist-mounted fisheye images and textual annotations. Each trajectory has a length ranging from 120 to 500 frames. Experimental results demonstrate that FastUMI-100K enables high policy success rates across various baseline algorithms, confirming its robustness, adaptability, and real-world applicability for solving complex, dynamic manipulation challenges. The source code and dataset will be released in this link https://github.com/MrKeee/FastUMI-100K.

Problem

Research questions and friction points this paper is trying to address.

Addressing scalability limitations in robotic manipulation datasets

Improving trajectory smoothness for data-driven robot learning

Enhancing cross-embodiment applicability in real-world environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular hardware-decoupled mechanical design system

Integrated lightweight tracking system for data collection

Large-scale multimodal dataset with diverse task coverage

🔎 Similar Papers

FastUMI: A Scalable and Hardware-Independent Universal Manipulation Interface with Dataset