Generalization of Self-Supervised Vision Transformers for Protein Localization Across Microscopy Domains

📅 2026-02-05

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

Microscopy image datasets are typically limited in scale, hindering the training of robust deep learning models, and the cross-domain transferability of self-supervised learning across varying staining protocols and channel configurations remains unclear. This work presents the first systematic evaluation of the cross-domain generalization capability of Vision Transformers pretrained with DINO self-supervised learning in the microscopy domain. Models pretrained on ImageNet-1k, the Human Protein Atlas (HPA), and OpenCell were evaluated via supervised classification heads for protein localization on OpenCell. Results show that the HPA-pretrained model achieves the best performance on OpenCell (macro F1: 0.8221 ± 0.0062), even surpassing the model pretrained on the target domain itself (0.8057 ± 0.0090), demonstrating that large-scale, domain-relevant pretraining substantially enhances cross-domain generalization.

Technology Category

Application Category

📝 Abstract

Task-specific microscopy datasets are often too small to train deep learning models that learn robust feature representations. Self-supervised learning (SSL) can mitigate this by pretraining on large unlabeled datasets, but it remains unclear how well such representations transfer across microscopy domains with different staining protocols and channel configurations. We investigate the cross-domain transferability of DINO-pretrained Vision Transformers for protein localization on the OpenCell dataset. We generate image embeddings using three DINO backbones pretrained on ImageNet-1k, the Human Protein Atlas (HPA), and OpenCell, and evaluate them by training a supervised classification head on OpenCell labels. All pretrained models transfer well, with the microscopy-specific HPA-pretrained model achieving the best performance (mean macro $F_1$-score = 0.8221 $\pm$ 0.0062), slightly outperforming a DINO model trained directly on OpenCell (0.8057 $\pm$ 0.0090). These results highlight the value of large-scale pretraining and indicate that domain-relevant SSL representations can generalize effectively to related but distinct microscopy datasets, enabling strong downstream performance even when task-specific labeled data are limited.

Problem

Research questions and friction points this paper is trying to address.

self-supervised learning

cross-domain transfer

protein localization

microscopy domains

Vision Transformers

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised learning

Vision Transformers

Cross-domain generalization