🤖 AI Summary
This work addresses the challenge of semantic segmentation in microscopy images, which is hindered by the scarcity and high cost of manual annotations, particularly in high-throughput materials characterization. To circumvent the need for real annotated data, the authors propose a novel unsupervised domain adaptation framework that first leverages phase-field simulations to generate synthetic microstructure images with perfect pixel-wise masks, then employs CycleGAN to translate these synthetic images into realistic scanning electron microscopy (SEM) images without paired supervision. A U-Net segmentation model is subsequently trained on the translated images. This approach uniquely integrates physics-based simulation with unpaired image-to-image translation, eliminating reliance on real-world annotations entirely. Experimental results demonstrate that the model achieves a boundary F1-score of 0.90 and an IoU of 0.88 on unseen real SEM images, with synthetic images exhibiting statistical and feature distributions closely matching those of real data.
📝 Abstract
Semantic segmentation of microscopy images is a critical task for high-throughput materials characterisation, yet its automation is severely constrained by the prohibitive cost, subjectivity, and scarcity of expert-annotated data. While physics-based simulations offer a scalable alternative to manual labelling, models trained on such data historically fail to generalise due to a significant domain gap, lacking the complex textures, noise patterns, and imaging artefacts inherent to experimental data. This paper introduces a novel framework for labour-free segmentation that successfully bridges this simulation-to-reality gap. Our pipeline leverages phase-field simulations to generate an abundant source of microstructural morphologies with perfect, intrinsically-derived ground-truth masks. We then employ a Cycle-Consistent Generative Adversarial Network (CycleGAN) for unpaired image-to-image translation, transforming the clean simulations into a large-scale dataset of high-fidelity, realistic SEM images. A U-Net model, trained exclusively on this synthetic data, demonstrated remarkable generalisation when deployed on unseen experimental images, achieving a mean Boundary F1-Score of 0.90 and an Intersection over Union (IOU) of 0.88. Comprehensive validation using t-SNE feature-space projection and Shannon entropy analysis confirms that our synthetic images are statistically and featurally indistinguishable from the real data manifold. By completely decoupling model training from manual annotation, our generative framework transforms a data-scarce problem into one of data abundance, providing a robust and fully automated solution to accelerate materials discovery and analysis.