๐ค AI Summary
To address geometric ambiguity arising from insufficient geometric priors in autonomous driving 3D perception and the scarcity of annotated fisheye surround-view datasets, this paper proposes OmniDepth-Occ. First, it introduces panoramic depth estimation as a geometric prior. Second, it designs a cylindrical voxel representation tailored to fisheye radial distortion, performing voxelization in polar coordinates. Third, it establishes a Sketch-Coloring two-stage occupancy prediction paradigm: depth-guided coarse-grained sketch generation followed by fine-grained coloring-based reconstruction. To support training and evaluation, we construct a large-scale synthetic fisheye multi-camera semantic occupancy datasetโtwice the size of SemanticKITTI. Experiments demonstrate significant improvements in 3D occupancy prediction accuracy on both our synthetic benchmark and standard real-world benchmarks, effectively mitigating geometric ambiguity and enhancing perceptual robustness in surround-view scenarios.
๐ Abstract
Accurate 3D perception is essential for autonomous driving. Traditional methods often struggle with geometric ambiguity due to a lack of geometric prior. To address these challenges, we use omnidirectional depth estimation to introduce geometric prior. Based on the depth information, we propose a Sketch-Coloring framework OmniDepth-Occ. Additionally, our approach introduces a cylindrical voxel representation based on polar coordinate to better align with the radial nature of panoramic camera views. To address the lack of fisheye camera dataset in autonomous driving tasks, we also build a virtual scene dataset with six fisheye cameras, and the data volume has reached twice that of SemanticKITTI. Experimental results demonstrate that our Sketch-Coloring network significantly enhances 3D perception performance.