🤖 AI Summary
Addressing the scarcity of high-quality labeled data and poor cross-domain generalization in underwater acoustic target recognition, this paper proposes an unsupervised contrastive representation learning method. We introduce the first integration of the Conformer architecture with Variance-Invariance-Covariance Regularization (VICR) loss for self-supervised pretraining on large-scale publicly available, low-quality, unlabeled underwater acoustic data. This yields robust, discriminative, and noise-resilient universal acoustic embeddings. Subsequently, lightweight supervised fine-tuning adapts the pretrained model to downstream tasks. Evaluated on two cross-domain tasks—vessel type classification and marine mammal vocalization categorization—the method achieves significant improvements in classification accuracy and generalization performance. Results demonstrate both the effectiveness and transferability of unsupervised pretraining for underwater acoustic analysis.
📝 Abstract
The increasing level of sound pollution in marine environments poses an increased threat to ocean health, making it crucial to monitor underwater noise. By monitoring this noise, the sources responsible for this pollution can be mapped. Monitoring is performed by passively listening to these sounds. This generates a large amount of data records, capturing a mix of sound sources such as ship activities and marine mammal vocalizations. Although machine learning offers a promising solution for automatic sound classification, current state-of-the-art methods implement supervised learning. This requires a large amount of high-quality labeled data that is not publicly available. In contrast, a massive amount of lower-quality unlabeled data is publicly available, offering the opportunity to explore unsupervised learning techniques. This research explores this possibility by implementing an unsupervised Contrastive Learning approach. Here, a Conformer-based encoder is optimized by the so-called Variance-Invariance-Covariance Regularization loss function on these lower-quality unlabeled data and the translation to the labeled data is made. Through classification tasks involving recognizing ship types and marine mammal vocalizations, our method demonstrates to produce robust and generalized embeddings. This shows to potential of unsupervised methods for various automatic underwater acoustic analysis tasks.