Cross-modal learning for plankton recognition

📅 2026-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge in plankton recognition of heavy reliance on large-scale manually annotated images and underutilization of synchronously acquired optical measurements—such as scattering and fluorescence profiles. To overcome this, the study introduces cross-modal self-supervised learning to the field for the first time. By leveraging paired image–optical profile data as weak supervision, a contrastive-learning-based dual-modal encoder is designed for self-supervised pretraining. Recognition is then achieved efficiently using only a small number of labeled samples combined with a k-NN classifier. Requiring merely binary modality pairing information, the proposed approach substantially reduces dependence on labeled data and achieves higher accuracy than image-only self-supervised baselines, even with extremely limited annotations.

Technology Category

Application Category

📝 Abstract
This paper considers self-supervised cross-modal coordination as a strategy enabling utilization of multiple modalities and large volumes of unlabeled plankton data to build models for plankton recognition. Automated imaging instruments facilitate the continuous collection of plankton image data on a large scale. Current methods for automatic plankton image recognition rely primarily on supervised approaches, which require labeled training sets that are labor-intensive to collect. On the other hand, some modern plankton imaging instruments complement image information with optical measurement data, such as scatter and fluorescence profiles, which currently are not widely utilized in plankton recognition. In this work, we explore the possibility of using such measurement data to guide the learning process without requiring manual labeling. Inspired by the concepts behind Contrastive Language-Image Pre-training, we train encoders for both modalities using only binary supervisory information indicating whether a given image and profile originate from the same particle or from different particles. For plankton recognition, we employ a small labeled gallery of known plankton species combined with a $k$-NN classifier. This approach yields a recognition model that is inherently multimodal, i.e., capable of utilizing information extracted from both image and profile data. We demonstrate that the proposed method achieves high recognition accuracy while requiring only a minimal number of labeled images. Furthermore, we show that the approach outperforms an image-only self-supervised baseline. Code available at https://github.com/Jookare/cross-modal-plankton.
Problem

Research questions and friction points this paper is trying to address.

cross-modal learning
plankton recognition
self-supervised learning
multimodal data
unlabeled data
Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-modal learning
self-supervised learning
plankton recognition
contrastive learning
multimodal representation
🔎 Similar Papers
No similar papers found.
J
Joona Kareinen
Lappeenranta-Lahti University of Technology LUT, Lappeenranta, Finland
V
Veikka Immonen
Lappeenranta-Lahti University of Technology LUT, Lappeenranta, Finland
Tuomas Eerola
Tuomas Eerola
Associate Professor at Lappeenranta-Lahti University of Technology LUT
Image processingcomputer visionpattern recognition
L
Lumi Haraguchi
Finnish Environment Institute, Helsinki, Finland
Lasse Lensu
Lasse Lensu
Lappeenranta-Lahti University of Technology LUT
computer visionmachine visionpattern recognitionmedical image analysisdata analysis
K
Kaisa Kraft
Finnish Environment Institute, Helsinki, Finland
S
Sanna Suikkanen
Finnish Environment Institute, Helsinki, Finland
Heikki Kälviäinen
Heikki Kälviäinen
Professor of Computer Science and Engineering, Lappeenranta-Lahti University of Technology (LUT)
Computer VisionMachine LearningPattern RecognitionMachine VisionAnimal Biometrics