Boosting Multi-View Indoor 3D Object Detection via Adaptive 3D Volume Construction

📅 2025-07-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address key bottlenecks in multi-view indoor 3D object detection—including fixed voxel receptive fields, excessive redundant computation in free space, and reliance on ground-truth scene geometry—this paper proposes SGCDet, a novel detection framework. Methodologically, SGCDet introduces three core innovations: (1) a geometry- and context-aware multi-view feature aggregation module that enables dynamic, weighted fusion of features across views; (2) a sparse adaptive 3D voxel construction mechanism that models only high-probability occupied regions, eliminating dependence on geometric supervision; and (3) a learnable adaptive receptive field to enhance voxel feature discriminability. Evaluated on ScanNet, ScanNet200, and ARKitScenes, SGCDet achieves state-of-the-art performance, with significant improvements in detection accuracy and approximately 40% reduction in redundant computation—demonstrating superior balance between precision and efficiency.

Technology Category

Application Category

📝 Abstract
This work presents SGCDet, a novel multi-view indoor 3D object detection framework based on adaptive 3D volume construction. Unlike previous approaches that restrict the receptive field of voxels to fixed locations on images, we introduce a geometry and context aware aggregation module to integrate geometric and contextual information within adaptive regions in each image and dynamically adjust the contributions from different views, enhancing the representation capability of voxel features. Furthermore, we propose a sparse volume construction strategy that adaptively identifies and selects voxels with high occupancy probabilities for feature refinement, minimizing redundant computation in free space. Benefiting from the above designs, our framework achieves effective and efficient volume construction in an adaptive way. Better still, our network can be supervised using only 3D bounding boxes, eliminating the dependence on ground-truth scene geometry. Experimental results demonstrate that SGCDet achieves state-of-the-art performance on the ScanNet, ScanNet200 and ARKitScenes datasets. The source code is available at https://github.com/RM-Zhang/SGCDet.
Problem

Research questions and friction points this paper is trying to address.

Enhancing multi-view 3D object detection via adaptive volume construction
Reducing redundant computation with sparse volume selection
Eliminating dependency on ground-truth scene geometry
Innovation

Methods, ideas, or system contributions that make the work stand out.

Geometry and context aware aggregation module
Sparse volume construction strategy
Supervised using only 3D bounding boxes
🔎 Similar Papers
No similar papers found.
R
Runmin Zhang
College of Information Science and Electronic Engineering, Zhejiang University
Z
Zhu Yu
College of Information Science and Electronic Engineering, Zhejiang University
Si-Yuan Cao
Si-Yuan Cao
Zhejiang University
image alignmenthomography estimationimage fusionplace recognition
L
Lingyu Zhu
City University of Hong Kong
G
Guangyi Zhang
College of Information Science and Electronic Engineering, Zhejiang University
Xiaokai Bai
Xiaokai Bai
Zhejiang University Ph.D student
Multimodal Fusion3D object detection4D Radar Perceptionautonomous driving
H
Hui-Liang Shen
College of Information Science and Electronic Engineering, Zhejiang University