🤖 AI Summary
To address the heavy reliance on manual annotations for fine-grained trait and part segmentation in biological specimen images, this paper proposes a novel “static segmentation as tracking” paradigm: given only one annotated image per species, static segmentation is reformulated as a cross-image mask propagation task via pseudo-video construction. Our method builds upon SAM2 and integrates three key components—pseudo-video generation, cross-image mask propagation, and cycle-consistent self-supervised fine-tuning—enabling efficient single-image adaptation. To our knowledge, this is the first approach achieving fine-grained segmentation under a one-shot (one annotated image per species) setting, attaining high accuracy on tasks such as butterfly wing pattern and beetle body segment segmentation. We further extend it to one-shot instance segmentation on field-captured images and trait-driven image retrieval. The framework significantly reduces annotation cost while enhancing the efficiency and scalability of biological trait analysis.
📝 Abstract
We study image segmentation in the biological domain, particularly trait and part segmentation from specimen images (e.g., butterfly wing stripes or beetle body parts). This is a crucial, fine-grained task that aids in understanding the biology of organisms. The conventional approach involves hand-labeling masks, often for hundreds of images per species, and training a segmentation model to generalize these labels to other images, which can be exceedingly laborious. We present a label-efficient method named Static Segmentation by Tracking (SST). SST is built upon the insight: while specimens of the same species have inherent variations, the traits and parts we aim to segment show up consistently. This motivates us to concatenate specimen images into a ``pseudo-video'' and reframe trait and part segmentation as a tracking problem. Concretely, SST generates masks for unlabeled images by propagating annotated or predicted masks from the ``pseudo-preceding'' images. Powered by Segment Anything Model 2 (SAM~2) initially developed for video segmentation, we show that SST can achieve high-quality trait and part segmentation with merely one labeled image per species -- a breakthrough for analyzing specimen images. We further develop a cycle-consistent loss to fine-tune the model, again using one labeled image. Additionally, we highlight the broader potential of SST, including one-shot instance segmentation on images taken in the wild and trait-based image retrieval.