🤖 AI Summary
This study addresses key challenges in video classification of cellular behaviors, including modeling objects without fixed boundaries, extracting spatiotemporal features across entire sequences, and capturing multicellular interactions. To this end, the authors establish the first video classification benchmark specifically designed for dynamic cellular behavior analysis and organize an international challenge to systematically evaluate 35 methods, spanning track-based feature classification, end-to-end deep learning, and hybrid spatiotemporal–tracking strategies. The work provides a comprehensive comparison between end-to-end approaches that eschew explicit tracking and traditional tracking-dependent methods, delineating their performance boundaries. It further demonstrates the efficacy of multimodal fusion and video-level spatiotemporal modeling, offering practical guidance for method selection in biological imaging and advancing the application of tracking-free spatiotemporal modeling in life sciences.
📝 Abstract
The classification of microscopy videos capturing complex cellular behaviors is crucial for understanding and quantifying the dynamics of biological processes over time. However, it remains a frontier in computer vision, requiring approaches that effectively model the shape and motion of objects without rigid boundaries, extract hierarchical spatiotemporal features from entire image sequences rather than static frames, and account for multiple objects within the field of view. To this end, we organized the Cell Behavior Video Classification Challenge (CBVCC), benchmarking 35 methods based on three approaches: classification of tracking-derived features, end-to-end deep learning architectures to directly learn spatiotemporal features from the entire video sequence without explicit cell tracking, or ensembling tracking-derived with image-derived features. We discuss the results achieved by the participants and compare the potential and limitations of each approach, serving as a basis to foster the development of computer vision methods for studying cellular dynamics.