Mitigating Trade-off: Stream and Query-guided Aggregation for Efficient and Effective 3D Occupancy Prediction

📅 2025-03-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the efficiency–accuracy trade-off in multi-frame 3D occupancy prediction for autonomous driving, this paper proposes StreamOcc, a streaming processing framework. StreamOcc introduces two key innovations: (1) streaming voxel aggregation and (2) query-guided instance-level feature aggregation—implemented via query-driven attention, lightweight recursive voxel feature updating, and dynamic instance-voxel alignment learning—to enable low-overhead, high-fidelity spatiotemporal modeling. Unlike conventional batch-processing paradigms, StreamOcc significantly alleviates memory and latency bottlenecks. Evaluated on Occ3D-nuScenes, it achieves real-time state-of-the-art performance: memory consumption is reduced by over 50%, inference latency meets automotive deployment requirements, and modeling accuracy for dynamic objects is improved.

Technology Category

Application Category

📝 Abstract
3D occupancy prediction has emerged as a key perception task for autonomous driving, as it reconstructs 3D environments to provide a comprehensive scene understanding. Recent studies focus on integrating spatiotemporal information obtained from past observations to improve prediction accuracy, using a multi-frame fusion approach that processes multiple past frames together. However, these methods struggle with a trade-off between efficiency and accuracy, which significantly limits their practicality. To mitigate this trade-off, we propose StreamOcc, a novel framework that aggregates spatio-temporal information in a stream-based manner. StreamOcc consists of two key components: (i) Stream-based Voxel Aggregation, which effectively accumulates past observations while minimizing computational costs, and (ii) Query-guided Aggregation, which recurrently aggregates instance-level features of dynamic objects into corresponding voxel features, refining fine-grained details of dynamic objects. Experiments on the Occ3D-nuScenes dataset show that StreamOcc achieves state-of-the-art performance in real-time settings, while reducing memory usage by more than 50% compared to previous methods.
Problem

Research questions and friction points this paper is trying to address.

Balancing efficiency and accuracy in 3D occupancy prediction
Reducing computational costs in spatiotemporal information aggregation
Improving fine-grained details of dynamic objects in perception tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Stream-based Voxel Aggregation for efficiency
Query-guided Aggregation for dynamic objects
Reduces memory usage by over 50%
🔎 Similar Papers
No similar papers found.