Domain-Invariant Representation Learning of Bird Sounds

📅 2024-09-13
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In passive acoustic monitoring (PAM), birdcall recognition models trained on focal recordings suffer from domain shift relative to real-world passive soundscapes, limiting generalization. To address this, we propose a domain-invariant representation learning framework centered on ProtoCLR—a novel supervised contrastive loss that replaces pairwise sample comparisons with class-level prototypes. This design substantially reduces computational complexity while enhancing cross-domain discriminability and invariance. Our method jointly optimizes prototype-based clustering and few-shot classification. Evaluated on the BIRB benchmark, it outperforms standard Supervised Contrastive Learning (SupCon) in few-shot recognition accuracy, demonstrating superior domain generalization, higher computational efficiency, and improved practicality for real-world PAM scenarios.

Technology Category

Application Category

📝 Abstract
Passive acoustic monitoring (PAM) is crucial for bioacoustic research, enabling non-invasive species tracking and biodiversity monitoring. Citizen science platforms provide large annotated datasets from focal recordings, where the target species is intentionally recorded. However, PAM requires monitoring in passive soundscapes, creating a domain shift between focal and passive recordings, challenging deep learning models trained on focal recordings. To address domain generalization, we leverage supervised contrastive learning by enforcing domain invariance across same-class examples from different domains. Additionally, we propose ProtoCLR, an alternative to SupCon loss which reduces the computational complexity by comparing examples to class prototypes instead of pairwise comparisons. We conduct few-shot classification based on BIRB, a large-scale bird sound benchmark to assess pre-trained bioacoustic models. Our findings suggest that ProtoCLR is a better alternative to SupCon.
Problem

Research questions and friction points this paper is trying to address.

Address domain shift in bioacoustic monitoring
Improve domain-invariant sound classification
Reduce computational complexity in contrastive learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Domain-invariant representation learning
ProtoCLR reduces computational complexity
Few-shot classification on bioacoustic models
🔎 Similar Papers
No similar papers found.