Perch 2.0: The Bittern Lesson for Bioacoustics

📅 2025-08-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address insufficient cross-species (beyond birds) sound classification and transfer learning capabilities in bioacoustics, this paper introduces the first large-scale self-distillation pretraining framework targeting multiple taxonomic groups. Methodologically, it innovatively incorporates fine-grained species classification into the self-distillation pipeline, jointly optimizing a prototype-based classifier and a source-prediction objective to enhance supervised signal quality and learn domain-invariant representations. The proposed model achieves state-of-the-art performance on the BirdSet and BEANS benchmarks and significantly outperforms specialized models in zero-shot and few-shot marine bioacoustic transfer tasks—demonstrating strong generalization and cross-domain adaptability. Key contributions include: (i) establishing the first multi-species self-distillation paradigm; (ii) proposing a prototype-source joint learning mechanism; and (iii) enabling efficient bioacoustic representation transfer without requiring any target-domain data.

Technology Category

Application Category

📝 Abstract
Perch is a performant pre-trained model for bioacoustics. It was trained in supervised fashion, providing both off-the-shelf classification scores for thousands of vocalizing species as well as strong embeddings for transfer learning. In this new release, Perch 2.0, we expand from training exclusively on avian species to a large multi-taxa dataset. The model is trained with self-distillation using a prototype-learning classifier as well as a new source-prediction training criterion. Perch 2.0 obtains state-of-the-art performance on the BirdSet and BEANS benchmarks. It also outperforms specialized marine models on marine transfer learning tasks, despite having almost no marine training data. We present hypotheses as to why fine-grained species classification is a particularly robust pre-training task for bioacoustics.
Problem

Research questions and friction points this paper is trying to address.

Expands bioacoustic model to multi-taxa from avian-only training
Improves species classification via self-distillation and prototype-learning
Achieves state-of-the-art performance on BirdSet and BEANS benchmarks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-distillation with prototype-learning classifier
Multi-taxa dataset for expanded training
Source-prediction training criterion
🔎 Similar Papers
No similar papers found.
B
Bart van Merriënboer
Google DeepMind
Vincent Dumoulin
Vincent Dumoulin
Research Scientist
Artificial IntelligenceMachine Learning
J
Jenny Hamer
Google DeepMind
L
Lauren Harrell
Google Research
A
Andrea Burns
Google DeepMind
T
Tom Denton
Google DeepMind