Using Deep Learning Models Pretrained by Self-Supervised Learning for Protein Localization

📅 2026-04-13

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work addresses the challenge of training robust deep models on small-scale microscopy imaging data and the unclear generalization capability of existing self-supervised models across variations in staining protocols and channel configurations. Leveraging ImageNet-1k and Human Protein Atlas (HPA) field-of-view (FOV) data, the study employs DINO-based self-supervised pretraining of Vision Transformers (ViTs) and systematically evaluates their zero-shot and fine-tuned performance on the OpenCell protein localization task. To mitigate multi-channel mismatches, a channel alignment strategy is introduced. Results demonstrate that the HPA FOV-pretrained model achieves a zero-shot macro F1 score of 0.822 ± 0.007 on OpenCell, improving to 0.860 ± 0.013 after fine-tuning; at the single-cell level, macro F1 scores reach at least 0.796, significantly outperforming current methods and confirming the strong cross-domain generalization of domain-relevant large-scale self-supervised pretraining for protein localization.

Technology Category

Application Category

📝 Abstract

Background: Task-specific microscopy datasets are often small, making it difficult to train deep learning models that learn robust features. While self-supervised learning (SSL) has shown promise through pretraining on large, domain-specific datasets, generalizability across datasets with differing staining protocols and channel configurations remains underexplored. We investigated the generalizability of SSL models pretrained on ImageNet-1k and HPA FOV, evaluating their embeddings on OpenCell with and without fine-tuning, two channel-mismatch strategies, and varying fine-tuning data fractions. We additionally analyzed single-cell embeddings on a labeled OpenCell subset. Result: DINO-based ViT backbones pretrained on HPA FOV or ImageNet-1k transfer well to OpenCell even without fine-tuning. The HPA FOV-pretrained model achieved the highest zero-shot performance (macro $F_1$ 0.822 $\pm$ 0.007). Fine-tuning further improved performance to 0.860 $\pm$ 0.013. At the single-cell level, the HPA single-cell-pretrained model achieved the highest k-nearest neighbor performance across all neighborhood sizes (macro $F_1$ $\geq$ 0.796). Conclusion: SSL methods like DINO, pretrained on large domain-relevant datasets, enable effective use of deep learning features for fine-tuning on small, task-specific microscopy datasets.

Problem

Research questions and friction points this paper is trying to address.

protein localization

self-supervised learning

microscopy datasets

domain generalization

small data

Innovation

Methods, ideas, or system contributions that make the work stand out.

self-supervised learning

DINO

Vision Transformer