🤖 AI Summary
To address label erasure/incorrect addition (i.e., label noise) induced by CutMix in multi-label remote sensing image classification, this paper proposes a pixel-level position-guided label propagation strategy. Leveraging spatial category distribution information from reference images or class-wise interpretability masks, the method dynamically refines the multi-label annotations of mixed images, ensuring semantically consistent and noise-free label generation. This work is the first to embed a location-aware label propagation mechanism into the CutMix framework—integrating paired positional cues to update multi-labels—thereby significantly enhancing model robustness under inaccurate labeling conditions. Experiments across multiple remote sensing benchmarks demonstrate an average 3.2% improvement in F1-score over baseline methods. Moreover, the proposed approach maintains stable performance under both synthetically simulated and real-world label noise scenarios.
📝 Abstract
The development of supervised deep learning-based methods for multi-label scene classification (MLC) is one of the prominent research directions in remote sensing (RS). Yet, collecting annotations for large RS image archives is time-consuming and costly. To address this issue, several data augmentation methods have been introduced in RS. Among others, the data augmentation technique CutMix, which combines parts of two existing training images to generate an augmented image, stands out as a particularly effective approach. However, the direct application of CutMix in RS MLC can lead to the erasure or addition of class labels (i.e., label noise) in the augmented (i.e., combined) training image. To address this problem, we introduce a label propagation (LP) strategy that allows the effective application of CutMix in the context of MLC problems in RS without being affected by label noise. To this end, our proposed LP strategy exploits pixel-level class positional information to update the multi-label of the augmented training image. We propose to access such class positional information from reference maps associated to each training image (e.g., thematic products) or from class explanation masks provided by an explanation method if no reference maps are available. Similarly to pairing two training images, our LP strategy carries out a pairing operation on the associated pixel-level class positional information to derive the updated multi-label for the augmented image. Experimental results show the effectiveness of our LP strategy in general and its robustness in the case of various simulated and real scenarios with noisy class positional information in particular.