π€ AI Summary
This paper addresses the dual tasks of single-frame 3D occupancy prediction and multi-frame future occupancy forecasting in autonomous driving, proposing UniOccβthe first unified benchmark for both. UniOcc integrates multi-source real-world (nuScenes, Waymo) and simulation (CARLA, OpenCOOD) datasets, providing voxel-level 2D/3D occupancy annotations and optical flow labels, and supporting evaluation for both ego-vehicle and cooperative driving scenarios. Methodologically, it introduces explicit voxel-flow supervision to enhance temporal consistency, designs novel ground-truth-free evaluation metrics for occupancy forecasting, and establishes a cross-domain joint training framework. Its core innovation lies in achieving the first triple unification: across real/simulated data domains, between prediction and forecasting tasks, and between single-vehicle and collaborative perception settings. Experiments demonstrate consistent improvements: +6.2% average mIoU and β23% temporal coherence error over state-of-the-art methods.
π Abstract
We introduce UniOcc, a comprehensive, unified benchmark for occupancy forecasting (i.e., predicting future occupancies based on historical information) and current-frame occupancy prediction from camera images. UniOcc unifies data from multiple real-world datasets (i.e., nuScenes, Waymo) and high-fidelity driving simulators (i.e., CARLA, OpenCOOD), which provides 2D/3D occupancy labels with per-voxel flow annotations and support for cooperative autonomous driving. In terms of evaluation, unlike existing studies that rely on suboptimal pseudo labels for evaluation, UniOcc incorporates novel metrics that do not depend on ground-truth occupancy, enabling robust assessment of additional aspects of occupancy quality. Through extensive experiments on state-of-the-art models, we demonstrate that large-scale, diverse training data and explicit flow information significantly enhance occupancy prediction and forecasting performance.