🤖 AI Summary
To address challenges in automated phytoplankton identification and tracking—including complex underwater backgrounds, small target sizes, frequent occlusions, and insufficient real-time performance—this work introduces MPT, the first large-scale multi-phytoplankton tracking benchmark. MPT comprises 140 videos featuring 27 phytoplankton species across 14 realistic underwater background categories. We propose the Deviation-Corrected Multi-Scale Feature Fusion Tracker (DSFT), which pioneers a residual dual-path feature extraction architecture integrating inter-frame similarity modeling with multi-scale feature prediction to mitigate small-object drift and focus misalignment. Evaluated on MPT, DSFT achieves state-of-the-art accuracy and real-time speed for individual-level phytoplankton tracking. This work establishes the first standardized benchmark and advanced methodological framework for quantitative, dynamic monitoring of marine ecological systems.
📝 Abstract
Phytoplankton are crucial for aquatic ecosystems and provide valuable insights into ocean environments and changes in ecosystems. Traditional phytoplankton monitoring methods are often complex and lack timely analysis capabilities. Thus, deep learning algorithms offer a promising approach for automated phytoplankton monitoring. However, the lack of large-scale, high-quality training datasets presents a major bottleneck in advancing phytoplankton tracking. Herein, we propose a challenging benchmark dataset called multiple phytoplankton tracking (MPT), which covers diverse background information and motion variations during observation. The dataset includes 27 phytoplankton and zooplankton species, 14 different backgrounds to simulate diverse and complex underwater environments, and 140 videos. To enable accurate real-time phytoplankton observation, we introduce the deviation-corrected multiscale feature fusion tracker (DSFT), a multiobject tracking method designed to overcome key issues such as focus shifts during tracking and the loss of critical information on small targets when computing frame-to-frame similarity. To enhance efficiency, we incorporate an additional feature extractor that predicts residuals from the output of the standard feature extractor; this enables multiscale frame-to-frame similarity comparisons based on features from different extractor layers. Extensive experiments conducted on the MPT dataset validated its effectiveness and demonstrated the superior performance of the DSFT method in tracking phytoplankton, providing an effective solution for phytoplankton monitoring.