🤖 AI Summary
Existing 4D panoptic occupancy tracking benchmarks lack support for surround-view fisheye cameras, long temporal sequences, and voxel-level instance tracking, hindering consistent and continuous understanding of dynamic 3D scenes. To address this, this work proposes OccTrack360, a novel benchmark that introduces the first long-sequence, diverse dataset tailored for fisheye surround-view cameras, accompanied by voxel visibility annotations. Furthermore, we present the FoSOcc framework, which incorporates spherical projection modeling and a focus-guided mechanism to jointly mitigate fisheye distortion and voxel localization errors through unified spherical upsampling, fisheye field-of-view masking, omnidirectional occlusion modeling, and supervised spatial focusing. Experiments demonstrate that our approach significantly improves tracking performance on both Occ3D-Waymo and OccTrack360, particularly excelling on geometrically regular object categories, thereby establishing a strong baseline for future research.
📝 Abstract
Understanding dynamic 3D environments in a spatially continuous and temporally consistent manner is fundamental for robotics and autonomous driving. While recent advances in occupancy prediction provide a unified representation of scene geometry and semantics, progress in 4D panoptic occupancy tracking remains limited by the lack of benchmarks that support surround-view fisheye sensing, long temporal sequences, and instance-level voxel tracking. To address this gap, we present OccTrack360, a new benchmark for 4D panoptic occupancy tracking from surround-view fisheye cameras. OccTrack360 provides substantially longer and more diverse sequences (174~2234 frames) than prior benchmarks, together with principled voxel visibility annotations, including an all-direction occlusion mask and an MEI-based fisheye field-of-view mask. To establish a strong fisheye-oriented baseline, we further propose Focus on Sphere Occ (FoSOcc), a framework that addresses two core challenges in fisheye occupancy tracking: distorted spherical projection and inaccurate voxel-space localization. FoSOcc includes a Center Focusing Module (CFM) to enhance instance-aware spatial localization through supervised focus guidance, and a Spherical Lift Module (SLM) that extends perspective lifting to fisheye imaging under the Unified Projection Model. Extensive experiments on Occ3D-Waymo and OccTrack360 show that our method improves occupancy tracking quality with notable gains on geometrically regular categories, and establishes a strong baseline for future research on surround-view fisheye 4D occupancy tracking. The benchmark and source code will be made publicly available at https://github.com/YouthZest-Lin/OccTrack360.