🤖 AI Summary
To address performance degradation in semantic segmentation caused by domain shift between synthetic and real data, this paper proposes the Domain Adversarial Kernel Prediction Network (DA-KPN), a novel unpaired image translation method. Unlike conventional GAN-based frameworks relying on cycle consistency, DA-KPN introduces learnable pixel-wise kernel parameters and employs a lightweight mapping function to generate spatially adaptive transformations, explicitly enforcing pixel-level semantic consistency between translated images and their synthetic labels. Multi-scale discriminators are integrated into an adversarial training scheme to jointly preserve photorealism and enhance semantic alignment. Experiments demonstrate that DA-KPN significantly outperforms state-of-the-art GAN methods on syn-to-real semantic segmentation benchmarks—especially under low real-label supervision—and achieves competitive performance on facial parsing tasks, validating its generalizability and practical utility.
📝 Abstract
Semantic segmentation relies on many dense pixel-wise annotations to achieve the best performance, but owing to the difficulty of obtaining accurate annotations for real world data, practitioners train on large-scale synthetic datasets. Unpaired image translation is one method used to address the ensuing domain gap by generating more realistic training data in low-data regimes. Current methods for unpaired image translation train generative adversarial networks (GANs) to perform the translation and enforce pixel-level semantic matching through cycle consistency. These methods do not guarantee that the semantic matching holds, posing a problem for semantic segmentation where performance is sensitive to noisy pixel labels. We propose a novel image translation method, Domain Adversarial Kernel Prediction Network (DA-KPN), that guarantees semantic matching between the synthetic label and translation. DA-KPN estimates pixel-wise input transformation parameters of a lightweight and simple translation function. To ensure the pixel-wise transformation is realistic, DA-KPN uses multi-scale discriminators to distinguish between translated and target samples. We show DA-KPN outperforms previous GAN-based methods on syn2real benchmarks for semantic segmentation with limited access to real image labels and achieves comparable performance on face parsing.