RoboEngine: Plug-and-Play Robot Data Augmentation with Semantic Robot Segmentation and Background Generation

📅 2025-03-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing vision-augmentation methods for robot imitation learning rely on stringent prerequisites—such as precise camera calibration or chroma-key green screens—limiting generalizability and deployment practicality. This paper introduces the first end-to-end, plug-and-play robotic scene generation framework that requires no environmental modification. Our approach comprises three key components: (1) constructing the first semantic-level robotic instance segmentation dataset and training a highly generalizable segmentation model; (2) designing a task-aware diffusion-based background generation module that jointly incorporates physical constraints and task semantics; and (3) integrating a lightweight API framework. Given only a single demonstration scene, our method generalizes to six unseen environments, boosting task success rates by over 200%. All datasets, model weights, and toolkits are publicly released.

Technology Category

Application Category

📝 Abstract
Visual augmentation has become a crucial technique for enhancing the visual robustness of imitation learning. However, existing methods are often limited by prerequisites such as camera calibration or the need for controlled environments (e.g., green screen setups). In this work, we introduce RoboEngine, the first plug-and-play visual robot data augmentation toolkit. For the first time, users can effortlessly generate physics- and task-aware robot scenes with just a few lines of code. To achieve this, we present a novel robot scene segmentation dataset, a generalizable high-quality robot segmentation model, and a fine-tuned background generation model, which together form the core components of the out-of-the-box toolkit. Using RoboEngine, we demonstrate the ability to generalize robot manipulation tasks across six entirely new scenes, based solely on demonstrations collected from a single scene, achieving a more than 200% performance improvement compared to the no-augmentation baseline. All datasets, model weights, and the toolkit will be publicly released.
Problem

Research questions and friction points this paper is trying to address.

Enhancing visual robustness in imitation learning
Overcoming camera calibration and controlled environment limitations
Generalizing robot tasks across new scenes efficiently
Innovation

Methods, ideas, or system contributions that make the work stand out.

Plug-and-play robot data augmentation toolkit
Semantic robot segmentation and background generation
Physics- and task-aware scene generation
🔎 Similar Papers
No similar papers found.
Chengbo Yuan
Chengbo Yuan
Institute for Interdisciplinary Information Science (IIIS), Tsinghua University
Embodied AIComputer VisionRobot LearningAgent
Suraj Joshi
Suraj Joshi
Tsinghua University
Machine LearningRoboticsComputer Vision
Shaoting Zhu
Shaoting Zhu
PhD Student, Tsinghua University
Robot LearningComputer VisionArtificial Intelligence
H
Hang Su
Department of Computer Science and Technology, Tsinghua University
H
Hang Zhao
Institute for Interdisciplinary Information Sciences, Tsinghua University; Shanghai Qi Zhi Institute; Shanghai Artificial Intelligence Laboratory
Y
Yang Gao
Institute for Interdisciplinary Information Sciences, Tsinghua University; Shanghai Qi Zhi Institute; Shanghai Artificial Intelligence Laboratory