🤖 AI Summary
This work addresses the long-standing limitations in scanning electron microscopy (SEM) image analysis, which has traditionally relied on task-specific models and labor-intensive manual workflows, lacking generalizability across diverse materials and imaging conditions. The authors propose the first self-supervised foundation model tailored for SEM images, leveraging a large-scale, multi-device, and multi-condition dataset for pretraining. Built upon a Transformer architecture enhanced with a Mixture-of-Experts (MoE) mechanism, the model learns transferable representations without requiring paired data, enabling high-quality translation from defocused to focused images. It further supports flexible fine-tuning across a range of downstream tasks. Experimental results demonstrate that the model significantly outperforms existing approaches across multiple metrics, confirming its strong generalization capability and practical utility.
📝 Abstract
Scanning Electron Microscopy (SEM) is indispensable in modern materials science, enabling high-resolution imaging across a wide range of structural, chemical, and functional investigations. However, SEM imaging remains constrained by task-specific models and labor-intensive acquisition processes that limit its scalability across diverse applications. Here, we introduce the first foundation model for SEM images, pretrained on a large corpus of multi-instrument, multi-condition scientific micrographs, enabling generalization across diverse material systems and imaging conditions. Leveraging a self-supervised transformer architecture, our model learns rich and transferable representations that can be fine-tuned or adapted to a wide range of downstream tasks. As a compelling demonstration, we focus on defocus-to-focus image translation-an essential yet underexplored challenge in automated microscopy pipelines. Our method not only restores focused detail from defocused inputs without paired supervision but also outperforms state-of-the-art techniques across multiple evaluation metrics. This work lays the groundwork for a new class of adaptable SEM models, accelerating materials discovery by bridging foundational representation learning with real-world imaging needs.