🤖 AI Summary
Underwater acoustic target recognition is hindered by the scarcity of labeled data and the complexity of marine environments. To address this challenge, this work proposes the UATR-SLM framework, which, for the first time, transfers large-scale human speech foundation models to this domain. The approach leverages pretrained speech large models as acoustic encoders by directly reusing their speech feature extraction pipelines and appends a lightweight classifier, eliminating the need for training from scratch. Evaluated on the DeepShip and ShipsEar datasets, the method achieves over 99% in-domain accuracy and 96.67% cross-domain accuracy, demonstrating exceptional generalization capability and robustness to varying signal lengths. These results validate the strong transfer potential of speech foundation models for underwater acoustic tasks.
📝 Abstract
Underwater acoustic target recognition (UATR) plays a vital role in marine applications but remains challenging due to limited labeled data and the complexity of ocean environments. This paper explores a central question: can speech large models (SLMs), trained on massive human speech corpora, be effectively transferred to underwater acoustics? To investigate this, we propose UATR-SLM, a simple framework that reuses the speech feature pipeline, adapts the SLM as an acoustic encoder, and adds a lightweight classifier.Experiments on the DeepShip and ShipsEar benchmarks show that UATR-SLM achieves over 99% in-domain accuracy, maintains strong robustness across variable signal lengths, and reaches up to 96.67% accuracy in cross-domain evaluation. These results highlight the strong transferability of SLMs to UATR, establishing a promising paradigm for leveraging speech foundation models in underwater acoustics.