🤖 AI Summary
Cell tracking and mitosis detection in time-lapse microscopy face significant challenges—including ambiguous cell boundaries, high cellular density, morphological variability, and low signal-to-noise ratios—while existing deep learning methods suffer from heavy reliance on labor-intensive manual annotations and poor generalizability. To address these limitations, we propose the first zero-shot cell tracking framework that requires no training data or fine-tuning. Our approach pioneers the adaptation of the video foundation model SAM2 to biomedical image analysis, integrating unsupervised temporal consistency modeling with appearance-based feature matching. This enables robust cross-dataset cell tracking and precise mitosis event detection. Evaluated on multi-scale 2D and 3D time-series microscopy datasets, our method achieves state-of-the-art accuracy while substantially improving generalization capability and deployment efficiency. By eliminating the need for task-specific supervision, it establishes a new paradigm for high-throughput, label-free analysis of dynamic cellular behaviors.
📝 Abstract
Tracking cells and detecting mitotic events in time-lapse microscopy image sequences is a crucial task in biomedical research. However, it remains highly challenging due to dividing objects, low signal-tonoise ratios, indistinct boundaries, dense clusters, and the visually similar appearance of individual cells. Existing deep learning-based methods rely on manually labeled datasets for training, which is both costly and time-consuming. Moreover, their generalizability to unseen datasets remains limited due to the vast diversity of microscopy data. To overcome these limitations, we propose a zero-shot cell tracking framework by integrating Segment Anything 2 (SAM2), a large foundation model designed for general image and video segmentation, into the tracking pipeline. As a fully-unsupervised approach, our method does not depend on or inherit biases from any specific training dataset, allowing it to generalize across diverse microscopy datasets without finetuning. Our approach achieves competitive accuracy in both 2D and large-scale 3D time-lapse microscopy videos while eliminating the need for dataset-specific adaptation.