🤖 AI Summary
Weakly supervised semantic segmentation with scribbles faces two key challenges: insufficient supervision due to sparse annotations and inconsistent predictions arising from diverse scribble styles. To address these, we propose the Class-Driven Scribble Augmentation Network (CD-SAN), introducing two novel components: a localization correction module to refine object region localization, and a distance-aware module to explicitly model spatial relationships among pixels. We further design a scribble simulation algorithm and construct ScribbleCOCO/ScribbleCityscapes—the first benchmark supporting multi-style scribble evaluation. Our method integrates pseudo-label optimization, noise-robust training, and modular feature localization. Extensive experiments demonstrate significant improvements in segmentation accuracy and prediction stability under multi-style scribble settings, surpassing existing state-of-the-art methods. The code and datasets will be publicly released.
📝 Abstract
Scribble-based weakly supervised semantic segmentation leverages only a few annotated pixels as labels to train a segmentation model, presenting significant potential for reducing the human labor involved in the annotation process. This approach faces two primary challenges: first, the sparsity of scribble annotations can lead to inconsistent predictions due to limited supervision; second, the variability in scribble annotations, reflecting differing human annotator preferences, can prevent the model from consistently capturing the discriminative regions of objects, potentially leading to unstable predictions. To address these issues, we propose a holistic framework, the class-driven scribble promotion network, for robust scribble-supervised semantic segmentation. This framework not only utilizes the provided scribble annotations but also leverages their associated class labels to generate reliable pseudo-labels. Within the network, we introduce a localization rectification module to mitigate noisy labels and a distance perception module to identify reliable regions surrounding scribble annotations and pseudo-labels. In addition, we introduce new large-scale benchmarks, ScribbleCOCO and ScribbleCityscapes, accompanied by a scribble simulation algorithm that enables evaluation across varying scribble styles. Our method demonstrates competitive performance in both accuracy and robustness, underscoring its superiority over existing approaches. The datasets and the codes will be made publicly available.