🤖 AI Summary
In passive acoustic monitoring (PAM), birdcall recognition models trained on focal recordings suffer from domain shift relative to real-world passive soundscapes, limiting generalization. To address this, we propose a domain-invariant representation learning framework centered on ProtoCLR—a novel supervised contrastive loss that replaces pairwise sample comparisons with class-level prototypes. This design substantially reduces computational complexity while enhancing cross-domain discriminability and invariance. Our method jointly optimizes prototype-based clustering and few-shot classification. Evaluated on the BIRB benchmark, it outperforms standard Supervised Contrastive Learning (SupCon) in few-shot recognition accuracy, demonstrating superior domain generalization, higher computational efficiency, and improved practicality for real-world PAM scenarios.
📝 Abstract
Passive acoustic monitoring (PAM) is crucial for bioacoustic research, enabling non-invasive species tracking and biodiversity monitoring. Citizen science platforms provide large annotated datasets from focal recordings, where the target species is intentionally recorded. However, PAM requires monitoring in passive soundscapes, creating a domain shift between focal and passive recordings, challenging deep learning models trained on focal recordings. To address domain generalization, we leverage supervised contrastive learning by enforcing domain invariance across same-class examples from different domains. Additionally, we propose ProtoCLR, an alternative to SupCon loss which reduces the computational complexity by comparing examples to class prototypes instead of pairwise comparisons. We conduct few-shot classification based on BIRB, a large-scale bird sound benchmark to assess pre-trained bioacoustic models. Our findings suggest that ProtoCLR is a better alternative to SupCon.