🤖 AI Summary
In multi-spectral pedestrian detection, sparse annotations lead to low-quality pseudo-labels and excessive model reliance on limited ground-truth labels. To address this, we propose SAMPD, a framework that leverages cross-modal knowledge guidance to generate high-fidelity pseudo-labels and dynamically fuses ground-truth and pseudo-labels to enhance pedestrian appearance diversity. Its core contributions are: (1) Multi-spectral Pedestrian-aware Adaptive Weighting (MPAW), which models pseudo-label confidence in a modality-aware manner; (2) Positive Pseudo-label Enhancement (PPE), improving the discriminability of pseudo-labels; and (3) Adaptive Pedestrian Retrieval Augmentation (APRA), strengthening cross-modal feature alignment and collaborative training. Evaluated under multi-spectral sparse-labeling settings, SAMPD achieves significant improvements in detection accuracy, while pseudo-label quality and model generalization both reach state-of-the-art performance.
📝 Abstract
Although existing Sparsely Annotated Object Detection (SAOD) approches have made progress in handling sparsely annotated environments in multispectral domain, where only some pedestrians are annotated, they still have the following limitations: (i) they lack considerations for improving the quality of pseudo-labels for missing annotations, and (ii) they rely on fixed ground truth annotations, which leads to learning only a limited range of pedestrian visual appearances in the multispectral domain. To address these issues, we propose a novel framework called Sparsely Annotated Multispectral Pedestrian Detection (SAMPD). For limitation (i), we introduce Multispectral Pedestrian-aware Adaptive Weight (MPAW) and Positive Pseudo-label Enhancement (PPE) module. Utilizing multispectral knowledge, these modules ensure the generation of high-quality pseudo-labels and enable effective learning by increasing weights for high-quality pseudo-labels based on modality characteristics. To address limitation (ii), we propose an Adaptive Pedestrian Retrieval Augmentation (APRA) module, which adaptively incorporates pedestrian patches from ground-truth and dynamically integrates high-quality pseudo-labels with the ground-truth, facilitating a more diverse learning pool of pedestrians. Extensive experimental results demonstrate that our SAMPD significantly enhances performance in sparsely annotated environments within the multispectral domain.