🤖 AI Summary
To address the efficiency–accuracy trade-off in multi-frame 3D occupancy prediction for autonomous driving, this paper proposes StreamOcc, a streaming processing framework. StreamOcc introduces two key innovations: (1) streaming voxel aggregation and (2) query-guided instance-level feature aggregation—implemented via query-driven attention, lightweight recursive voxel feature updating, and dynamic instance-voxel alignment learning—to enable low-overhead, high-fidelity spatiotemporal modeling. Unlike conventional batch-processing paradigms, StreamOcc significantly alleviates memory and latency bottlenecks. Evaluated on Occ3D-nuScenes, it achieves real-time state-of-the-art performance: memory consumption is reduced by over 50%, inference latency meets automotive deployment requirements, and modeling accuracy for dynamic objects is improved.
📝 Abstract
3D occupancy prediction has emerged as a key perception task for autonomous driving, as it reconstructs 3D environments to provide a comprehensive scene understanding. Recent studies focus on integrating spatiotemporal information obtained from past observations to improve prediction accuracy, using a multi-frame fusion approach that processes multiple past frames together. However, these methods struggle with a trade-off between efficiency and accuracy, which significantly limits their practicality. To mitigate this trade-off, we propose StreamOcc, a novel framework that aggregates spatio-temporal information in a stream-based manner. StreamOcc consists of two key components: (i) Stream-based Voxel Aggregation, which effectively accumulates past observations while minimizing computational costs, and (ii) Query-guided Aggregation, which recurrently aggregates instance-level features of dynamic objects into corresponding voxel features, refining fine-grained details of dynamic objects. Experiments on the Occ3D-nuScenes dataset show that StreamOcc achieves state-of-the-art performance in real-time settings, while reducing memory usage by more than 50% compared to previous methods.