R3D: Revisiting 3D Policy Learning

📅 2026-04-16
📈 Citations: 0
Influential: 0
📄 PDF

career value

192K/year
🤖 AI Summary
This work addresses the challenges of training instability and severe overfitting that hinder the deployment of powerful 3D-aware models in 3D policy learning. Through systematic diagnosis, the authors identify the absence of effective 3D data augmentation and the use of Batch Normalization as key bottlenecks. To overcome these issues, they propose the first stable architecture tailored for large-scale 3D imitation learning, integrating a scalable Transformer-based 3D encoder, a diffusion decoder, effective 3D data augmentation strategies, and the removal of Batch Normalization. The resulting method significantly outperforms existing approaches across multiple embodied manipulation benchmarks, establishing a new foundation for scalable and high-performance 3D imitation learning.

Technology Category

Application Category

📝 Abstract
3D policy learning promises superior generalization and cross-embodiment transfer, but progress has been hindered by training instabilities and severe overfitting, precluding the adoption of powerful 3D perception models. In this work, we systematically diagnose these failures, identifying the omission of 3D data augmentation and the adverse effects of Batch Normalization as primary causes. We propose a new architecture coupling a scalable transformer-based 3D encoder with a diffusion decoder, engineered specifically for stability at scale and designed to leverage large-scale pre-training. Our approach significantly outperforms state-of-the-art 3D baselines on challenging manipulation benchmarks, establishing a new and robust foundation for scalable 3D imitation learning. Project Page: https://r3d-policy.github.io/
Problem

Research questions and friction points this paper is trying to address.

3D policy learning
training instability
overfitting
3D perception
cross-embodiment transfer
Innovation

Methods, ideas, or system contributions that make the work stand out.

3D policy learning
transformer-based 3D encoder
diffusion decoder
3D data augmentation
scalable imitation learning
🔎 Similar Papers
No similar papers found.