RoboAug: One Annotation to Hundreds of Scenes via Region-Contrastive Data Augmentation for Robotic Manipulation

πŸ“… 2026-02-15
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work proposes RoboAug, a novel approach to enhance robotic generalization in unseen environments while reducing reliance on large-scale datasets and perfect object detection. Leveraging only a single bounding box annotation per image, RoboAug employs a pretrained generative model for semantic data augmentation and introduces a plug-and-play region-wise contrastive loss to guide the policy toward task-relevant regions. This method enables the generation of hundreds of diverse training scenes from a single annotated exampleβ€”a first in the field. Evaluated on three robotic platforms (UR-5e, AgileX, and Tien Kung 2.0), RoboAug significantly improves task success rates in unseen scenarios from 0.09/0.16/0.19 to 0.47/0.60/0.67, outperforming existing data augmentation techniques.

Technology Category

Application Category

πŸ“ Abstract
Enhancing the generalization capability of robotic learning to enable robots to operate effectively in diverse, unseen scenes is a fundamental and challenging problem. Existing approaches often depend on pretraining with large-scale data collection, which is labor-intensive and time-consuming, or on semantic data augmentation techniques that necessitate an impractical assumption of flawless upstream object detection in real-world scenarios. In this work, we propose RoboAug, a novel generative data augmentation framework that significantly minimizes the reliance on large-scale pretraining and the perfect visual recognition assumption by requiring only the bounding box annotation of a single image during training. Leveraging this minimal information, RoboAug employs pre-trained generative models for precise semantic data augmentation and integrates a plug-and-play region-contrastive loss to help models focus on task-relevant regions, thereby improving generalization and boosting task success rates. We conduct extensive real-world experiments on three robots, namely UR-5e, AgileX, and Tien Kung 2.0, spanning over 35k rollouts. Empirical results demonstrate that RoboAug significantly outperforms state-of-the-art data augmentation baselines. Specifically, when evaluating generalization capabilities in unseen scenes featuring diverse combinations of backgrounds, distractors, and lighting conditions, our method achieves substantial gains over the baseline without augmentation. The success rates increase from 0.09 to 0.47 on UR-5e, from 0.16 to 0.60 on AgileX, and from 0.19 to 0.67 on Tien Kung 2.0. These results highlight the superior generalization and effectiveness of RoboAug in real-world manipulation tasks. Our project is available at https://x-roboaug.github.io/.
Problem

Research questions and friction points this paper is trying to address.

robotic manipulation
generalization
data augmentation
object detection
unseen scenes
Innovation

Methods, ideas, or system contributions that make the work stand out.

RoboAug
region-contrastive loss
semantic data augmentation
minimal annotation
robotic manipulation
X
Xinhua Wang
Beijing Innovation Center of Humanoid Robotics
K
Kun Wu
Beijing Innovation Center of Humanoid Robotics
Zhen Zhao
Zhen Zhao
X-Humanoid
Transfer learningComputer VisionImitation learningRobot
H
Hu Cao
Computation, Information and Technology, Technical University of Munich
Yinuo Zhao
Yinuo Zhao
Phd, Beijing Institute of Technology
Deep reinforcement learningmobile crowdsensingrobot learning
Z
Zhiyuan Xu
Beijing Innovation Center of Humanoid Robotics
Meng Li
Meng Li
Beijing University of Posts and Telecommunications
Child-Computer InteractionDigital Heritage
S
Shichao Fan
Beijing Innovation Center of Humanoid Robotics; The School of Mechanical Engineering and Automation, Beihang University
Di Wu
Di Wu
Professor of Computer Science, Sun Yat-Sen University
networkingmultimedia communicationdistributed computing
Y
Yixue Zhang
Beijing Innovation Center of Humanoid Robotics; The School of Advanced Manufacturing and Robotics, Peking University
Ning Liu
Ning Liu
Humanoid Robotics
AIoTEfficient AIRoboticsComputer Vision
Zhengping Che
Zhengping Che
X-Humanoid
Embodied AIDeep Learning
J
Jian Tang
Beijing Innovation Center of Humanoid Robotics