🤖 AI Summary
This work addresses the challenges posed by light-sheet microscopy (LSM) data—namely its high dimensionality, large volume, high annotation cost, and the absence of modality-specific 3D foundation models—by introducing the first multimodal 3D foundation model tailored for LSM. Built upon a 3D Transformer architecture, the model jointly optimizes masked autoencoding and image-text contrastive learning on a large-scale, unlabeled dataset encompassing multiple species, staining protocols, and imaging conditions. This approach learns transferable voxel-level representations that significantly reduce reliance on labeled data and uniformly support diverse downstream tasks, including segmentation, classification, and deblurring. Evaluated under standard quantitative metrics and expert qualitative assessment, the proposed model consistently outperforms existing methods.
📝 Abstract
Light sheet fluorescence microscopy (LSM) enables high-resolution, three-dimensional (3D) imaging of biological specimens, providing rich volumetric data for studying cellular organization, pathology, and vascular networks. However, the size, dimensionality, and annotation burden of LSM data make supervised deep learning approaches costly and difficult to scale. Additionally, despite the abundance of unannotated LSM volumes, foundation models for this modality remain underexplored due to computational challenges and the complexity of volumetric representation learning. In this work, we introduce a 3D foundation model for LSM data, pretrained on a large curated collection of 3D images spanning multiple organisms, stains, and imaging protocols. We learn transferable volumetric representations by jointly optimizing for masked reconstruction and image-text alignment. The pretrained backbone drastically reduces the annotation burden, enabling efficient, few-shot adaptation for varied downstream tasks. We evaluate this approach on downstream segmentation, classification, and deblurring. Our results demonstrate consistent improvements over baselines, (1) when measured using standard evaluation metrics and (2) when rigorously assessed by domain experts. This highlights the potential of foundation model pretraining to reduce annotation requirements while improving performance across diverse LSM analysis tasks. Pretrained model weights and code for pretraining and finetuning are publicly available: https://github.com/AdinaScheinfeld/lsm_fm_public_repo.git.