🤖 AI Summary
To address insufficient cross-species (beyond birds) sound classification and transfer learning capabilities in bioacoustics, this paper introduces the first large-scale self-distillation pretraining framework targeting multiple taxonomic groups. Methodologically, it innovatively incorporates fine-grained species classification into the self-distillation pipeline, jointly optimizing a prototype-based classifier and a source-prediction objective to enhance supervised signal quality and learn domain-invariant representations. The proposed model achieves state-of-the-art performance on the BirdSet and BEANS benchmarks and significantly outperforms specialized models in zero-shot and few-shot marine bioacoustic transfer tasks—demonstrating strong generalization and cross-domain adaptability. Key contributions include: (i) establishing the first multi-species self-distillation paradigm; (ii) proposing a prototype-source joint learning mechanism; and (iii) enabling efficient bioacoustic representation transfer without requiring any target-domain data.
📝 Abstract
Perch is a performant pre-trained model for bioacoustics. It was trained in supervised fashion, providing both off-the-shelf classification scores for thousands of vocalizing species as well as strong embeddings for transfer learning. In this new release, Perch 2.0, we expand from training exclusively on avian species to a large multi-taxa dataset. The model is trained with self-distillation using a prototype-learning classifier as well as a new source-prediction training criterion. Perch 2.0 obtains state-of-the-art performance on the BirdSet and BEANS benchmarks. It also outperforms specialized marine models on marine transfer learning tasks, despite having almost no marine training data. We present hypotheses as to why fine-grained species classification is a particularly robust pre-training task for bioacoustics.