π€ AI Summary
This work addresses two key challenges in underwater active target recognition (UATR): severe scarcity of labeled data and limited effectiveness of cross-modal transfer learning. We systematically compare the transferability of ImageNet-pretrained vision models (e.g., ResNet, EfficientNet) versus domain-specific audio pre-trained models (PANNs) for few-shot sonar classification. Spectrograms are generated from raw sonar signals as input representations, and all models undergo identical data augmentation and fine-tuning protocols. Our results reveal that ImageNet-pretrained models achieve marginally higher classification accuracy than PANNs and demonstrate markedly superior robustness to low sampling-rate sonar dataβan effect we identify as a previously unreported critical factor governing pretraining-finetuning performance. This study establishes the efficacy of vision-based cross-modal transfer for underwater acoustic recognition and proposes a novel, resource-efficient paradigm for low-data UATR.
π Abstract
Transfer learning is commonly employed to leverage large, pre-trained models and perform fine-tuning for downstream tasks. The most prevalent pre-trained models are initially trained using ImageNet. However, their ability to generalize can vary across different data modalities. This study compares pre-trained Audio Neural Networks (PANNs) and ImageNet pre-trained models within the context of underwater acoustic target recognition (UATR). It was observed that the ImageNet pre-trained models slightly out-perform pre-trained audio models in passive sonar classification. We also analyzed the impact of audio sampling rates for model pre-training and fine-tuning. This study contributes to transfer learning applications of UATR, illustrating the potential of pre-trained models to address limitations caused by scarce, labeled data in the UATR domain.