R2RGEN: Real-to-Real 3D Data Generation for Spatially Generalized Manipulation

📅 2025-10-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses two key challenges in robotic manipulation: poor spatial generalization and the scarcity of real-world point cloud data, which is difficult to augment efficiently. To this end, we propose a real-to-real point cloud generation framework that requires neither simulation nor rendering. Our method integrates fine-grained scene parsing, object-group augmentation, camera-perception–guided viewpoint alignment, and trajectory annotation to perform structure-preserving spatial transformations on real-world point cloud–action pairs. The core contribution is the first demonstration of diverse, geometry-consistent, and task-relevant point cloud generation entirely within the real-data domain. Experiments show that high-quality, multi-configuration training samples can be generated from a single real-world demonstration. When applied to mobile manipulation tasks, the augmented data significantly improves both the spatial generalization and data efficiency of imitation learning policies, establishing a scalable data augmentation paradigm for robot learning in real-world settings.

Technology Category

Application Category

📝 Abstract
Towards the aim of generalized robotic manipulation, spatial generalization is the most fundamental capability that requires the policy to work robustly under different spatial distribution of objects, environment and agent itself. To achieve this, substantial human demonstrations need to be collected to cover different spatial configurations for training a generalized visuomotor policy via imitation learning. Prior works explore a promising direction that leverages data generation to acquire abundant spatially diverse data from minimal source demonstrations. However, most approaches face significant sim-to-real gap and are often limited to constrained settings, such as fixed-base scenarios and predefined camera viewpoints. In this paper, we propose a real-to-real 3D data generation framework (R2RGen) that directly augments the pointcloud observation-action pairs to generate real-world data. R2RGen is simulator- and rendering-free, thus being efficient and plug-and-play. Specifically, given a single source demonstration, we introduce an annotation mechanism for fine-grained parsing of scene and trajectory. A group-wise augmentation strategy is proposed to handle complex multi-object compositions and diverse task constraints. We further present camera-aware processing to align the distribution of generated data with real-world 3D sensor. Empirically, R2RGen substantially enhances data efficiency on extensive experiments and demonstrates strong potential for scaling and application on mobile manipulation.
Problem

Research questions and friction points this paper is trying to address.

Generating spatially diverse real-world 3D data for robot manipulation
Overcoming sim-to-real limitations in robotic imitation learning
Enhancing data efficiency for generalized visuomotor policies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Real-to-real 3D pointcloud augmentation framework
Annotation mechanism for scene and trajectory parsing
Camera-aware processing for real-world sensor alignment
🔎 Similar Papers
No similar papers found.
Xiuwei Xu
Xiuwei Xu
Tsinghua University
computer visionembodied AI
A
Angyuan Ma
Tsinghua University
H
Hankun Li
Tsinghua University
B
Bingyao Yu
Tsinghua University
Z
Zheng Zhu
GigaAI
J
Jie Zhou
Tsinghua University
J
Jiwen Lu
Tsinghua University