From Human Speech to Ocean Signals: Transferring Speech Large Models for Underwater Acoustic Target Recognition

📅 2026-01-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Underwater acoustic target recognition is hindered by the scarcity of labeled data and the complexity of marine environments. To address this challenge, this work proposes the UATR-SLM framework, which, for the first time, transfers large-scale human speech foundation models to this domain. The approach leverages pretrained speech large models as acoustic encoders by directly reusing their speech feature extraction pipelines and appends a lightweight classifier, eliminating the need for training from scratch. Evaluated on the DeepShip and ShipsEar datasets, the method achieves over 99% in-domain accuracy and 96.67% cross-domain accuracy, demonstrating exceptional generalization capability and robustness to varying signal lengths. These results validate the strong transfer potential of speech foundation models for underwater acoustic tasks.

Technology Category

Application Category

📝 Abstract
Underwater acoustic target recognition (UATR) plays a vital role in marine applications but remains challenging due to limited labeled data and the complexity of ocean environments. This paper explores a central question: can speech large models (SLMs), trained on massive human speech corpora, be effectively transferred to underwater acoustics? To investigate this, we propose UATR-SLM, a simple framework that reuses the speech feature pipeline, adapts the SLM as an acoustic encoder, and adds a lightweight classifier.Experiments on the DeepShip and ShipsEar benchmarks show that UATR-SLM achieves over 99% in-domain accuracy, maintains strong robustness across variable signal lengths, and reaches up to 96.67% accuracy in cross-domain evaluation. These results highlight the strong transferability of SLMs to UATR, establishing a promising paradigm for leveraging speech foundation models in underwater acoustics.
Problem

Research questions and friction points this paper is trying to address.

Underwater acoustic target recognition
limited labeled data
complex ocean environments
speech large models
transfer learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

speech large models
underwater acoustic target recognition
transfer learning
foundation models
acoustic encoder
M
Mengcheng Huang
College of Computer Science and Technology, Harbin Engineering University, Harbin, China
X
Xue Zhou
College of Computer Science and Technology, Harbin Engineering University, Harbin, China
Chen Xu
Chen Xu
Harbin Engineering University
natural language processingmachine translationspeech translation
D
Dapeng Man
College of Computer Science and Technology, Harbin Engineering University, Harbin, China