Three Cars Approaching within 100m! Enhancing Distant Geometry by Tri-Axis Voxel Scanning for Camera-based Semantic Scene Completion

📅 2024-11-25

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Monocular semantic scene completion (SSC) suffers from severe underestimation of distant geometric structures due to perspective distortion and occlusion. To address this, we propose ScanSSC—a novel end-to-end framework mapping monocular images to 3D semantic voxel grids. Its core contributions are: (1) a tri-axial voxel scanning mechanism that enhances distant voxel awareness of near-field contextual cues; (2) axial near-to-far cascaded masked self-attention, enabling spatially selective feature modeling; and (3) Scan Loss, which accumulates logits along each axis to provide gradient-guided optimization for distant regions. Evaluated on SemanticKITTI and SSCBench-KITTI-360, ScanSSC achieves IoU scores of 44.54 and 48.29, and mIoU scores of 17.40 and 20.14, respectively—setting new state-of-the-art performance for camera-based SSC.

Technology Category

Application Category

📝 Abstract

Camera-based Semantic Scene Completion (SSC) is gaining attentions in the 3D perception field. However, properties such as perspective and occlusion lead to the underestimation of the geometry in distant regions, posing a critical issue for safety-focused autonomous driving systems. To tackle this, we propose ScanSSC, a novel camera-based SSC model composed of a Scan Module and Scan Loss, both designed to enhance distant scenes by leveraging context from near-viewpoint scenes. The Scan Module uses axis-wise masked attention, where each axis employing a near-to-far cascade masking that enables distant voxels to capture relationships with preceding voxels. In addition, the Scan Loss computes the cross-entropy along each axis between cumulative logits and corresponding class distributions in a near-to-far direction, thereby propagating rich context-aware signals to distant voxels. Leveraging the synergy between these components, ScanSSC achieves state-of-the-art performance, with IoUs of 44.54 and 48.29, and mIoUs of 17.40 and 20.14 on the SemanticKITTI and SSCBench-KITTI-360 benchmarks.

Problem

Research questions and friction points this paper is trying to address.

Enhancing distant geometry in camera-based Semantic Scene Completion

Addressing underestimation due to perspective and occlusion issues

Improving safety for autonomous driving systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Tri-axis voxel scanning enhances distant geometry

Axis-wise masked attention captures voxel relationships

Scan Loss propagates context-aware signals distally

🔎 Similar Papers

No similar papers found.