🤖 AI Summary
To address key bottlenecks in multi-view indoor 3D object detection—including fixed voxel receptive fields, excessive redundant computation in free space, and reliance on ground-truth scene geometry—this paper proposes SGCDet, a novel detection framework. Methodologically, SGCDet introduces three core innovations: (1) a geometry- and context-aware multi-view feature aggregation module that enables dynamic, weighted fusion of features across views; (2) a sparse adaptive 3D voxel construction mechanism that models only high-probability occupied regions, eliminating dependence on geometric supervision; and (3) a learnable adaptive receptive field to enhance voxel feature discriminability. Evaluated on ScanNet, ScanNet200, and ARKitScenes, SGCDet achieves state-of-the-art performance, with significant improvements in detection accuracy and approximately 40% reduction in redundant computation—demonstrating superior balance between precision and efficiency.
📝 Abstract
This work presents SGCDet, a novel multi-view indoor 3D object detection framework based on adaptive 3D volume construction. Unlike previous approaches that restrict the receptive field of voxels to fixed locations on images, we introduce a geometry and context aware aggregation module to integrate geometric and contextual information within adaptive regions in each image and dynamically adjust the contributions from different views, enhancing the representation capability of voxel features. Furthermore, we propose a sparse volume construction strategy that adaptively identifies and selects voxels with high occupancy probabilities for feature refinement, minimizing redundant computation in free space. Benefiting from the above designs, our framework achieves effective and efficient volume construction in an adaptive way. Better still, our network can be supervised using only 3D bounding boxes, eliminating the dependence on ground-truth scene geometry. Experimental results demonstrate that SGCDet achieves state-of-the-art performance on the ScanNet, ScanNet200 and ARKitScenes datasets. The source code is available at https://github.com/RM-Zhang/SGCDet.