🤖 AI Summary
This work addresses the challenge of large modality gaps between structured illumination microscopy (SIM) and conventional H&E-stained histopathology images, which hinder direct transfer of existing foundation models. To bridge this gap, the authors propose a cross-modal self-supervised pretraining framework that leverages H&E images as semantic anchors. By jointly optimizing adversarial alignment, contrastive learning, and reconstruction losses, the method uniquely incorporates the rich structural priors from H&E into SIM representation learning, achieving asymmetric augmentation—enhancing SIM feature representations while preserving H&E performance. The resulting unified encoder significantly outperforms models trained from scratch or pretrained solely on H&E across downstream tasks such as multiple instance learning and morphological clustering, demonstrating the effectiveness and generalizability of the proposed cross-modal alignment strategy.
📝 Abstract
Structured Illumination Microscopy (SIM) enables rapid, high-contrast optical sectioning of fresh tissue without staining or physical sectioning, making it promising for intraoperative and point-of-care diagnostics. Recent foundation and large-scale self-supervised models in digital pathology have demonstrated strong performance on section-based modalities such as Hematoxylin and Eosin (H&E) and immunohistochemistry (IHC). However, these approaches are predominantly trained on thin tissue sections and do not explicitly address thick-tissue fluorescence modalities such as SIM. When transferred directly to SIM, performance is constrained by substantial modality shift, and naive fine-tuning often overfits to modality-specific appearance rather than underlying histological structure. We introduce SIMPLER (Structured Illumination Microscopy-Powered Learning for Embedding Representations), a cross-modality self-supervised pretraining framework that leverages H&E as a semantic anchor to learn reusable SIM representations. H&E encodes rich cellular and glandular structure aligned with established clinical annotations, while SIM provides rapid, nondestructive imaging of fresh tissue. During pretraining, SIM and H&E are progressively aligned through adversarial, contrastive, and reconstruction-based objectives, encouraging SIM embeddings to internalize histological structure from H&E without collapsing modality-specific characteristics. A single pretrained SIMPLER encoder transfers across multiple downstream tasks, including multiple instance learning and morphological clustering, consistently outperforming SIM models trained from scratch or H&E-only pretraining. Importantly, joint alignment enhances SIM performance without degrading H&E representations, demonstrating asymmetric enrichment rather