A Multimodal 3D Foundation Model for Light Sheet Fluorescence Microscopy Enables Few-Shot Segmentation, Classification, and Deblurring

📅 2026-05-25

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This work addresses the challenges posed by light-sheet microscopy (LSM) data—namely its high dimensionality, large volume, high annotation cost, and the absence of modality-specific 3D foundation models—by introducing the first multimodal 3D foundation model tailored for LSM. Built upon a 3D Transformer architecture, the model jointly optimizes masked autoencoding and image-text contrastive learning on a large-scale, unlabeled dataset encompassing multiple species, staining protocols, and imaging conditions. This approach learns transferable voxel-level representations that significantly reduce reliance on labeled data and uniformly support diverse downstream tasks, including segmentation, classification, and deblurring. Evaluated under standard quantitative metrics and expert qualitative assessment, the proposed model consistently outperforms existing methods.

📝 Abstract

Light sheet fluorescence microscopy (LSM) enables high-resolution, three-dimensional (3D) imaging of biological specimens, providing rich volumetric data for studying cellular organization, pathology, and vascular networks. However, the size, dimensionality, and annotation burden of LSM data make supervised deep learning approaches costly and difficult to scale. Additionally, despite the abundance of unannotated LSM volumes, foundation models for this modality remain underexplored due to computational challenges and the complexity of volumetric representation learning. In this work, we introduce a 3D foundation model for LSM data, pretrained on a large curated collection of 3D images spanning multiple organisms, stains, and imaging protocols. We learn transferable volumetric representations by jointly optimizing for masked reconstruction and image-text alignment. The pretrained backbone drastically reduces the annotation burden, enabling efficient, few-shot adaptation for varied downstream tasks. We evaluate this approach on downstream segmentation, classification, and deblurring. Our results demonstrate consistent improvements over baselines, (1) when measured using standard evaluation metrics and (2) when rigorously assessed by domain experts. This highlights the potential of foundation model pretraining to reduce annotation requirements while improving performance across diverse LSM analysis tasks. Pretrained model weights and code for pretraining and finetuning are publicly available: https://github.com/AdinaScheinfeld/lsm_fm_public_repo.git.

Problem

Research questions and friction points this paper is trying to address.

Light sheet fluorescence microscopy

3D foundation model

annotation burden

volumetric representation learning

few-shot learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

3D foundation model

light sheet fluorescence microscopy

few-shot learning