🤖 AI Summary
To address the performance degradation of 3D object detection under surround-view fisheye cameras, this paper proposes an end-to-end detection framework explicitly tailored to fisheye geometry. The core innovation lies in introducing spherical-space representation to model fisheye distortion intrinsically, yielding two novel architectures: FisheyeBEVDet (built upon the bird’s-eye-view paradigm) and FisheyePETR (based on the query-based paradigm). To facilitate systematic research, we introduce Fisheye3DOD—the first benchmark dataset dedicated to surround-view fisheye 3D object detection—comprising multi-view fisheye-pinhole image pairs and precise 3D annotations synthesized in CARLA. Extensive experiments on Fisheye3DOD demonstrate that our methods outperform pinhole-based baselines by up to 6.2% in AP₅₀, establishing new state-of-the-art performance and significantly advancing fisheye-based 3D visual perception.
📝 Abstract
In this work, we explore the technical feasibility of implementing end-to-end 3D object detection (3DOD) with surround-view fisheye camera system. Specifically, we first investigate the performance drop incurred when transferring classic pinhole-based 3D object detectors to fisheye imagery. To mitigate this, we then develop two methods that incorporate the unique geometry of fisheye images into mainstream detection frameworks: one based on the bird's-eye-view (BEV) paradigm, named FisheyeBEVDet, and the other on the query-based paradigm, named FisheyePETR. Both methods adopt spherical spatial representations to effectively capture fisheye geometry. In light of the lack of dedicated evaluation benchmarks, we release Fisheye3DOD, a new open dataset synthesized using CARLA and featuring both standard pinhole and fisheye camera arrays. Experiments on Fisheye3DOD show that our fisheye-compatible modeling improves accuracy by up to 6.2% over baseline methods.