Real-Time 3D Object Detection with Inference-Aligned Learning

๐Ÿ“… 2025-11-20
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
In real-time 3D point cloud detection, a training-inference misalignment arises: training lacks explicit spatial reliability modeling and ranking awareness, while inference relies on confidence-score-based bounding box rankingโ€”leading to suboptimal representation learning. To address this, we propose SR3D, the first framework to jointly introduce spatially prioritized optimal transport assignment and ranking-aware self-distillation for real-time indoor 3D detection, thereby aligning training objectives with inference mechanisms. Our method comprises: (1) a geometry-consistent, spatially prioritized matching strategy; (2) ranking-sensitive loss-guided self-distillation; and (3) a lightweight spatial reliability modeling module. Evaluated on ScanNet V2 and SUN RGB-D, SR3D achieves โ‰ˆ30 FPS real-time inference while significantly outperforming state-of-the-art methods in mAP, empirically validating the efficacy of training-inference consistency design.

Technology Category

Application Category

๐Ÿ“ Abstract
Real-time 3D object detection from point clouds is essential for dynamic scene understanding in applications such as augmented reality, robotics and navigation. We introduce a novel Spatial-prioritized and Rank-aware 3D object detection (SR3D) framework for indoor point clouds, to bridge the gap between how detectors are trained and how they are evaluated. This gap stems from the lack of spatial reliability and ranking awareness during training, which conflicts with the ranking-based prediction selection used as inference. Such a training-inference gap hampers the model's ability to learn representations aligned with inference-time behavior. To address the limitation, SR3D consists of two components tailored to the spatial nature of point clouds during training: a novel spatial-prioritized optimal transport assignment that dynamically emphasizes well-located and spatially reliable samples, and a rank-aware adaptive self-distillation scheme that adaptively injects ranking perception via a self-distillation paradigm. Extensive experiments on ScanNet V2 and SUN RGB-D show that SR3D effectively bridges the training-inference gap and significantly outperforms prior methods in accuracy while maintaining real-time speed.
Problem

Research questions and friction points this paper is trying to address.

Bridges the gap between detector training and evaluation methods
Addresses lack of spatial reliability in 3D object detection training
Solves ranking awareness limitations conflicting with inference selection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spatial-prioritized optimal transport assignment for reliable samples
Rank-aware adaptive self-distillation scheme for ranking perception
Bridges training-inference gap in 3D object detection framework
๐Ÿ”Ž Similar Papers
No similar papers found.