🤖 AI Summary
To address the high annotation cost of weakly supervised semantic segmentation in autonomous robotics, this paper proposes a model-agnostic Deep Edge Alignment Loss (DEAL) that leverages readily available depth maps to generate high-quality pixel-level pseudo-labels under image-level supervision. DEAL explicitly aligns edge structures between RGB images and depth map gradients, thereby enhancing spatial consistency of weak supervision signals without requiring additional manual annotations. It is plug-and-play compatible with mainstream segmentation architectures. Experiments on PASCAL VOC, MS COCO, and HOPE demonstrate consistent improvements in mean Intersection-over-Union (mIoU) by 5.44, 1.27, and 16.42 percentage points, respectively—substantially outperforming existing weakly supervised methods. These results validate the effectiveness of depth modality as a geometric prior for guiding weakly supervised semantic segmentation.
📝 Abstract
Autonomous robotic systems applied to new domains require an abundance of expensive, pixel-level dense labels to train robust semantic segmentation models under full supervision. This study proposes a model-agnostic Depth Edge Alignment Loss to improve Weakly Supervised Semantic Segmentation models across different datasets. The methodology generates pixel-level semantic labels from image-level supervision, avoiding expensive annotation processes. While weak supervision is widely explored in traditional computer vision, our approach adds supervision with pixel-level depth information, a modality commonly available in robotic systems. We demonstrate how our approach improves segmentation performance across datasets and models, but can also be combined with other losses for even better performance, with improvements up to +5.439, +1.274 and +16.416 points in mean Intersection over Union on the PASCAL VOC / MS COCO validation, and the HOPE static onboarding split, respectively. Our code will be made publicly available.