Efficient Camera Pose Augmentation for View Generalization in Robotic Policy Learning

📅 2026-03-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing 2D visual motion policies suffer from limited generalization under novel viewpoints due to their reliance on static observations. This work proposes GenSplat, a framework that reconstructs scenes in high-fidelity 3D through a single forward pass of 3D Gaussian Splatting (3DGS) from sparse, uncalibrated images, leveraging a permutation-equivariant network architecture. To prevent geometric collapse, the method introduces 3D prior distillation as a regularization mechanism and trains the policy using synthetic data generated via multi-view rendering. By grounding agent decisions in consistent underlying 3D structure, GenSplat significantly enhances robustness to viewpoint variations, substantially outperforming baseline approaches under severe spatial perturbations and markedly improving task success rates in unseen viewpoints.
📝 Abstract
Prevailing 2D-centric visuomotor policies exhibit a pronounced deficiency in novel view generalization, as their reliance on static observations hinders consistent action mapping across unseen views. In response, we introduce GenSplat, a feed-forward 3D Gaussian Splatting framework that facilitates view-generalized policy learning through novel view rendering. GenSplat employs a permutation-equivariant architecture to reconstruct high-fidelity 3D scenes from sparse, uncalibrated inputs in a single forward pass. To ensure structural integrity, we design a 3D-prior distillation strategy that regularizes the 3DGS optimization, preventing the geometric collapse typical of purely photometric supervision. By rendering diverse synthetic views from these stable 3D representations, we systematically augment the observational manifold during training. This augmentation forces the policy to ground its decisions in underlying 3D structures, thereby ensuring robust execution under severe spatial perturbations where baselines severely degrade.
Problem

Research questions and friction points this paper is trying to address.

view generalization
camera pose
visuomotor policy
3D representation
novel view
Innovation

Methods, ideas, or system contributions that make the work stand out.

3D Gaussian Splatting
view generalization
permutation-equivariant architecture
3D-prior distillation
camera pose augmentation
🔎 Similar Papers
No similar papers found.
S
Sen Wang
Xi’an Jiaotong University, Noah’s Ark Lab
H
Huaiyi Dong
Xi’an Jiaotong University, Noah’s Ark Lab
J
Jingyi Tian
Xi’an Jiaotong University, Noah’s Ark Lab
J
Jiayi Li
Xi’an Jiaotong University, Noah’s Ark Lab
Zhuo Yang
Zhuo Yang
Xidian University & Shanghai AI Laboratory
Lauge Language ModelAI for Science
Tongtong Cao
Tongtong Cao
Researcher, Huawei Noah's Ark Lab
RoboticsEmbodied AIAutonomous driving
A
Anlin Chen
Xi’an Jiaotong University, Noah’s Ark Lab
Shuang Wu
Shuang Wu
Noah's Ark Lab Huawei
Le Wang
Le Wang
Xi'an Jiaotong University
Computer VisionImage Processing
Sanping Zhou
Sanping Zhou
Xi'an Jiaotong University
Computer VisionMachine Learning