🤖 AI Summary
This study addresses the lack of systematic understanding regarding the downstream impact of fisheye cameras in robot imitation learning, particularly concerning spatial localization, scene generalization, and hardware generalization. For the first time, it comprehensively evaluates—through both simulation and real-world experiments—how the wide field of view of wrist-mounted fisheye cameras influences policy learning. To mitigate cross-camera transfer failures, the work introduces Random Scale Augmentation (RSA). Results demonstrate that the wide field of view significantly enhances spatial localization—contingent on environmental complexity—while diverse training environments improve scene generalization. Moreover, RSA effectively boosts hardware generalization across different camera systems. This work provides critical empirical evidence and practical solutions for deploying fisheye vision in robotic manipulation tasks.
📝 Abstract
The adoption of fisheye cameras in robotic manipulation, driven by their exceptionally wide Field of View (FoV), is rapidly outpacing a systematic understanding of their downstream effects on policy learning. This paper presents the first comprehensive empirical study to bridge this gap, rigorously analyzing the properties of wrist-mounted fisheye cameras for imitation learning. Through extensive experiments in both simulation and the real world, we investigate three critical research questions: spatial localization, scene generalization, and hardware generalization. Our investigation reveals that: (1) The wide FoV significantly enhances spatial localization, but this benefit is critically contingent on the visual complexity of the environment. (2) Fisheye-trained policies, while prone to overfitting in simple scenes, unlock superior scene generalization when trained with sufficient environmental diversity. (3) While naive cross-camera transfer leads to failures, we identify the root cause as scale overfitting and demonstrate that hardware generalization performance can be improved with a simple Random Scale Augmentation (RSA) strategy. Collectively, our findings provide concrete, actionable guidance for the large-scale collection and effective use of fisheye datasets in robotic learning. More results and videos are available on https://robo-fisheye.github.io/