Crossing the Species Divide: Transfer Learning from Speech to Animal Sounds

📅 2025-09-04

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This study investigates the cross-domain transferability of speech self-supervised models (HuBERT, WavLM, XEUS) to bioacoustic tasks—addressing the critical challenge of scarce labeled data in animal sound detection and classification. We propose a time-aware representation analysis and noise-robustness evaluation framework, integrating linear probing, temporal downstream modeling, and frequency-band sensitivity analysis to systematically assess how different pretraining strategies affect cross-species sound recognition. Experiments across multiple animal audio datasets show that fine-tuning only the top layers achieves performance comparable to domain-specific bioacoustic models; notably, noise-robust pretraining substantially enhances generalization under low signal-to-noise ratio conditions. To our knowledge, this is the first systematic validation of speech self-supervised representations for non-speech bioacoustic applications. Our work establishes a new paradigm for low-cost, robust cross-domain acoustic modeling, demonstrating that speech-derived representations serve as effective, general-purpose features for bioacoustics.

Technology Category

Application Category

📝 Abstract

Self-supervised speech models have demonstrated impressive performance in speech processing, but their effectiveness on non-speech data remains underexplored. We study the transfer learning capabilities of such models on bioacoustic detection and classification tasks. We show that models such as HuBERT, WavLM, and XEUS can generate rich latent representations of animal sounds across taxa. We analyze the models properties with linear probing on time-averaged representations. We then extend the approach to account for the effect of time-wise information with other downstream architectures. Finally, we study the implication of frequency range and noise on performance. Notably, our results are competitive with fine-tuned bioacoustic pre-trained models and show the impact of noise-robust pre-training setups. These findings highlight the potential of speech-based self-supervised learning as an efficient framework for advancing bioacoustic research.

Problem

Research questions and friction points this paper is trying to address.

Transfer learning from speech models to animal sounds

Effectiveness of self-supervised models on bioacoustic tasks

Analyzing model performance across frequency ranges and noise

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transfer learning from speech to animal sounds

Self-supervised models generate rich bioacoustic representations

Noise-robust pre-training enhances bioacoustic classification performance

🔎 Similar Papers

No similar papers found.

Anthropic

$350,000—$500,000 USD

San Francisco, CA, USA

AI Research Scientist - Meta Superintelligence Labs (PhD)