🤖 AI Summary
Semantic segmentation domain generalization faces dual challenges: domain shift and misalignment between generated images and their corresponding segmentation masks. This paper proposes FLEX-Seg, the first framework to treat inherent noise and mask misalignment in diffusion-generated data as learnable signals—not artifacts. It introduces three core components: (1) multi-scale boundary modeling to dynamically emphasize ambiguous boundaries; (2) uncertainty-aware loss weighting, guided by prediction entropy, for adaptive hard-sample learning; and (3) difficulty-aware progressive sampling to refine generation quality. Additionally, an adaptive prototype mechanism mitigates cross-domain feature drift. Evaluated on five realistic cross-domain benchmarks—including ACDC and Dark Zurich—FLEX-Seg achieves state-of-the-art performance, improving mIoU by 2.44% on ACDC and 2.63% on Dark Zurich.
📝 Abstract
Domain generalization in semantic segmentation faces challenges from domain shifts, particularly under adverse conditions. While diffusion-based data generation methods show promise, they introduce inherent misalignment between generated images and semantic masks. This paper presents FLEX-Seg (FLexible Edge eXploitation for Segmentation), a framework that transforms this limitation into an opportunity for robust learning. FLEX-Seg comprises three key components: (1) Granular Adaptive Prototypes that captures boundary characteristics across multiple scales, (2) Uncertainty Boundary Emphasis that dynamically adjusts learning emphasis based on prediction entropy, and (3) Hardness-Aware Sampling that progressively focuses on challenging examples. By leveraging inherent misalignment rather than enforcing strict alignment, FLEX-Seg learns robust representations while capturing rich stylistic variations. Experiments across five real-world datasets demonstrate consistent improvements over state-of-the-art methods, achieving 2.44% and 2.63% mIoU gains on ACDC and Dark Zurich. Our findings validate that adaptive strategies for handling imperfect synthetic data lead to superior domain generalization. Code is available at https://github.com/VisualScienceLab-KHU/FLEX-Seg.