🤖 AI Summary
Monocular 3D object detection suffers significant performance degradation under sparse annotation scenarios, which are prevalent due to the high cost of 3D labeling. To address this challenge, this work proposes a detection framework tailored for sparse supervision, featuring two key innovations: (1) Road-Aware Patch Augmentation (RAPA), which preserves 3D geometric consistency by pasting image patches cropped from segmented road regions, and (2) a Prototype-based Filtering (PBF) mechanism that leverages prototype similarity and depth uncertainty to effectively mine informative pseudo-labels from unlabeled regions. Extensive experiments demonstrate that the proposed method substantially outperforms existing approaches across various sparse annotation settings, confirming its efficiency and robustness under low annotation budgets.
📝 Abstract
Monocular 3D object detection has achieved impressive performance on densely annotated datasets. However, it struggles when only a fraction of objects are labeled due to the high cost of 3D annotation. This sparsely annotated setting is common in real-world scenarios where annotating every object is impractical. To address this, we propose a novel framework for sparsely annotated monocular 3D object detection with two key modules. First, we propose Road-Aware Patch Augmentation (RAPA), which leverages sparse annotations by augmenting segmented object patches onto road regions while preserving 3D geometric consistency. Second, we propose Prototype-Based Filtering (PBF), which generates high-quality pseudo-labels by filtering predictions through prototype similarity and depth uncertainty. It maintains global 2D RoI feature prototypes and selects pseudo-labels that are both feature-consistent with learned prototypes and have reliable depth estimates. Our training strategy combines geometry-preserving augmentation with prototype-guided pseudo-labeling to achieve robust detection under sparse supervision. Extensive experiments demonstrate the effectiveness of the proposed method. The source code is available at https://github.com/VisualAIKHU/MonoSAOD .