From Human Speech to Ocean Signals: Transferring Speech Large Models for Underwater Acoustic Target Recognition

📅 2026-01-26

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Underwater acoustic target recognition is hindered by the scarcity of labeled data and the complexity of marine environments. To address this challenge, this work proposes the UATR-SLM framework, which, for the first time, transfers large-scale human speech foundation models to this domain. The approach leverages pretrained speech large models as acoustic encoders by directly reusing their speech feature extraction pipelines and appends a lightweight classifier, eliminating the need for training from scratch. Evaluated on the DeepShip and ShipsEar datasets, the method achieves over 99% in-domain accuracy and 96.67% cross-domain accuracy, demonstrating exceptional generalization capability and robustness to varying signal lengths. These results validate the strong transfer potential of speech foundation models for underwater acoustic tasks.

Technology Category

Application Category

📝 Abstract

Underwater acoustic target recognition (UATR) plays a vital role in marine applications but remains challenging due to limited labeled data and the complexity of ocean environments. This paper explores a central question: can speech large models (SLMs), trained on massive human speech corpora, be effectively transferred to underwater acoustics? To investigate this, we propose UATR-SLM, a simple framework that reuses the speech feature pipeline, adapts the SLM as an acoustic encoder, and adds a lightweight classifier.Experiments on the DeepShip and ShipsEar benchmarks show that UATR-SLM achieves over 99% in-domain accuracy, maintains strong robustness across variable signal lengths, and reaches up to 96.67% accuracy in cross-domain evaluation. These results highlight the strong transferability of SLMs to UATR, establishing a promising paradigm for leveraging speech foundation models in underwater acoustics.

Problem

Research questions and friction points this paper is trying to address.

Underwater acoustic target recognition

limited labeled data

complex ocean environments

speech large models

transfer learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

speech large models

underwater acoustic target recognition

transfer learning