ODG: Occupancy Prediction Using Dual Gaussians

📅 2025-06-11

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

In 3D occupancy prediction for autonomous driving, existing methods face a fundamental accuracy-efficiency trade-off: BEV-based approaches underperform on small objects, while sparse point-based methods are inefficient for large planar structures. To address this, we propose a dual-representation framework jointly modeling BEV and sparse point features. Our contributions are threefold: (1) a novel cross-attention fusion mechanism enabling geometric information sharing and end-to-end joint optimization between branches; (2) a query-driven sparse point branch synergistically integrated with the BEV feature stream—enhancing small-object perception while improving efficiency for planar surfaces; and (3) a differentiable occupancy distribution model leveraging dual Gaussian priors. Evaluated on Occ3D-nuScenes and Occ3D-Waymo, our method achieves state-of-the-art performance. It significantly outperforms dense voxel-based methods in inference speed and matches the efficiency of current top-performing lightweight approaches.

Technology Category

Application Category

📝 Abstract

3D occupancy provides fine-grained 3D geometry and semantics for scene understanding which is critical for autonomous driving. Most existing methods, however, carry high compute costs, requiring dense 3D feature volume and cross-attention to effectively aggregate information. More recent works have adopted Bird's Eye View (BEV) or sparse points as scene representation with much reduced cost, but still suffer from their respective shortcomings. More concretely, BEV struggles with small objects that often experience significant information loss after being projected to the ground plane. On the other hand, points can flexibly model little objects in 3D, but is inefficient at capturing flat surfaces or large objects. To address these challenges, in this paper, we present a novel 3D occupancy prediction approach, ODG, which combines BEV and sparse points based representations. We propose a dual-branch design: a query-based sparse points branch and a BEV branch. The 3D information learned in the sparse points branch is shared with the BEV stream via cross-attention, which enriches the weakened signals of difficult objects on the BEV plane. The outputs of both branches are finally fused to generate predicted 3D occupancy. We conduct extensive experiments on the Occ3D-nuScenes and Occ3D-Waymo benchmarks that demonstrate the superiority of our proposed ODG. Moreover, ODG also delivers competitive inference speed when compared to the latest efficient approaches.

Problem

Research questions and friction points this paper is trying to address.

High compute costs in 3D occupancy prediction methods

BEV struggles with small objects information loss

Sparse points inefficient for flat or large objects

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines BEV and sparse points representations

Uses dual-branch design for 3D occupancy

Shares 3D info via cross-attention between branches

🔎 Similar Papers

No similar papers found.