🤖 AI Summary
To address the challenges of redundant visual information and severe occlusions in multi-camera 3D robotic manipulation—leading to low operational efficiency—this paper proposes a task-driven virtual viewpoint generation method. Our core innovation is the novel “virtual eye” mechanism: leveraging foundation models and 3D point cloud representations, it jointly integrates a depth-aware perception module with a dynamic coarse-to-fine decoding strategy to adaptively synthesize task-optimal virtual viewpoints, effectively suppressing irrelevant visual distractions. The method enables end-to-end joint optimization of view synthesis and action planning. It outperforms state-of-the-art methods on both RLBench simulation and real-world benchmarks. Training and inference speeds are accelerated by 1.89× and 1.54×, respectively, while robustness to occlusion and action precision are significantly improved.
📝 Abstract
When performing 3D manipulation tasks, robots have to execute action planning based on perceptions from multiple fixed cameras. The multi-camera setup introduces substantial redundancy and irrelevant information, which increases computational costs and forces the model to spend extra training time extracting crucial task-relevant details. To filter out redundant information and accurately extract task-relevant features, we propose the VERM (Virtual Eye for Robotic Manipulation) method, leveraging the knowledge in foundation models to imagine a virtual task-adaptive view from the constructed 3D point cloud, which efficiently captures necessary information and mitigates occlusion. To facilitate 3D action planning and fine-grained manipulation, we further design a depth-aware module and a dynamic coarse-to-fine procedure. Extensive experimental results on both simulation benchmark RLBench and real-world evaluations demonstrate the effectiveness of our method, surpassing previous state-of-the-art methods while achieving 1.89x speedup in training time and 1.54x speedup in inference speed. More results can be found on our project website at https://verm-ral.github.io .