๐ค AI Summary
In Domain Generalized Semantic Segmentation (DGSS), synthetic data generated by diffusion models often exhibit structural and semantic artifacts, degrading segmentation performance and causing error accumulation. To address this, we propose a Reverse Evolution Enhancement (REE) framework: (1) we embed a Laplacian-prior-guided reverse evolution layer directly into the segmentation decoder to actively suppress generation artifacts and block their propagation; and (2) we introduce a multi-scale frequency fusion module to enhance cross-scale semantic consistency. Crucially, our method requires no target-domain data, yet significantly improves generalization to unseen domains. Evaluated on multiple DGSS benchmarks, REE achieves state-of-the-art performance with markedly reduced noise sensitivity and superior robustness, demonstrating the effectiveness of reverse evolution in joint generative-segmentation optimization.
๐ Abstract
Domain Generalized Semantic Segmentation (DGSS) focuses on training a model using labeled data from a source domain, with the goal of achieving robust generalization to unseen target domains during inference. A common approach to improve generalization is to augment the source domain with synthetic data generated by diffusion models (DMs). However, the generated images often contain structural or semantic defects due to training imperfections. Training segmentation models with such flawed data can lead to performance degradation and error accumulation. To address this issue, we propose to integrate inverse evolution layers (IELs) into the generative process. IELs are designed to highlight spatial discontinuities and semantic inconsistencies using Laplacian-based priors, enabling more effective filtering of undesirable generative patterns. Based on this mechanism, we introduce IELDM, an enhanced diffusion-based data augmentation framework that can produce higher-quality images. Furthermore, we observe that the defect-suppression capability of IELs can also benefit the segmentation network by suppressing artifact propagation. Based on this insight, we embed IELs into the decoder of the DGSS model and propose IELFormer to strengthen generalization capability in cross-domain scenarios. To further strengthen the model's semantic consistency across scales, IELFormer incorporates a multi-scale frequency fusion (MFF) module, which performs frequency-domain analysis to achieve structured integration of multi-resolution features, thereby improving cross-scale coherence. Extensive experiments on benchmark datasets demonstrate that our approach achieves superior generalization performance compared to existing methods.