🤖 AI Summary
Existing electron microscopy (EM) image segmentation benchmarks, based on small-scale and curated datasets, fail to capture the high morphological heterogeneity of organelles and large-scale spatial context present in real-world scenarios, thereby limiting model generalization. To address this, this work introduces a large-scale multi-organelle instance segmentation benchmark comprising over 100,000 2D EM images across diverse cell types and five organelle classes. We propose a connectivity-aware 3D label propagation algorithm (3D LPA), combined with expert correction, to efficiently generate high-quality 3D instance annotations. Benchmark evaluation reveals that current state-of-the-art models—including U-Net, SAM variants, and Mask2Former—struggle significantly with globally distributed structures such as the endoplasmic reticulum, highlighting a fundamental gap between local-context modeling approaches and the demands of real-world biological complexity.
📝 Abstract
Accurate instance-level segmentation of organelles in electron microscopy (EM) is critical for quantitative analysis of subcellular morphology and inter-organelle interactions. However, current benchmarks, based on small, curated datasets, fail to capture the inherent heterogeneity and large spatial context of in-the-wild EM data, imposing fundamental limitations on current patch-based methods. To address these limitations, we developed a large-scale, multi-source benchmark for multi-organelle instance segmentation, comprising over 100,000 2D EM images across variety cell types and five organelle classes that capture real-world variability. Dataset annotations were generated by our designed connectivity-aware Label Propagation Algorithm (3D LPA) with expert refinement. We further benchmarked several state-of-the-art models, including U-Net, SAM variants, and Mask2Former. Our results show several limitations: current models struggle to generalize across heterogeneous EM data and perform poorly on organelles with global, distributed morphologies (e.g., Endoplasmic Reticulum). These findings underscore the fundamental mismatch between local-context models and the challenge of modeling long-range structural continuity in the presence of real-world variability. The benchmark dataset and labeling tool will be publicly released soon.