BioME: A Resource-Efficient Bioacoustic Foundational Model for IoT Applications

📅 2026-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high computational cost and poor generalization of existing self-supervised audio encoders in bioacoustic tasks, which hinder their deployment on resource-constrained edge devices. The authors propose a lightweight encoder that compresses parameters from a high-capacity teacher model via inter-layer knowledge distillation and enhances ecological generalization through multi-domain self-supervised pretraining on speech, environmental sounds, and animal vocalizations. A novel integration of FiLM-based conditional modulation injects digital signal processing priors to improve feature disentanglement in the low-capacity student model. With only 25% of the original parameter count, the proposed model matches or exceeds the performance of the full-scale counterpart across multiple bioacoustic benchmarks, substantially reducing computational demands while remaining suitable for IoT edge deployment.

Technology Category

Application Category

📝 Abstract
Passive acoustic monitoring has become a key strategy in biodiversity assessment, conservation, and behavioral ecology, especially as Internet-of-Things (IoT) devices enable continuous in situ audio collection at scale. While recent self-supervised learning (SSL)-based audio encoders, such as BEATs and AVES, have shown strong performance in bioacoustic tasks, their computational cost and limited robustness to unseen environments hinder deployment on resource-constrained platforms. In this work, we introduce BioME, a resource-efficient audio encoder designed for bioacoustic applications. BioME is trained via layer-to-layer distillation from a high-capacity teacher model, enabling strong representational transfer while reducing the parameter count by 75%. To further improve ecological generalization, the model is pretrained on multi-domain data spanning speech, environmental sounds, and animal vocalizations. A key contribution is the integration of modulation-aware acoustic features via FiLM conditioning, injecting a DSP-inspired inductive bias that enhances feature disentanglement in low-capacity regimes. Across multiple bioacoustic tasks, BioME matches or surpasses the performance of larger models, including its teacher, while being suitable for resource-constrained IoT deployments. For reproducibility, code and pretrained checkpoints are publicly available.
Problem

Research questions and friction points this paper is trying to address.

bioacoustics
resource-constrained
IoT
self-supervised learning
model deployment
Innovation

Methods, ideas, or system contributions that make the work stand out.

layer-to-layer distillation
modulation-aware features
FiLM conditioning
resource-efficient audio encoder
multi-domain pretraining
🔎 Similar Papers
No similar papers found.
H
Heitor R. Guimarães
Institut national de la recherche scientifique (INRS - EMT), Montréal, QC, Canada
A
Abhishek Tiwari
Institut national de la recherche scientifique (INRS - EMT), Montréal, QC, Canada
M
Mahsa Abdollahi
Institut national de la recherche scientifique (INRS - EMT), Montréal, QC, Canada
A
Anderson R. Avila
Institut national de la recherche scientifique (INRS - EMT), Montréal, QC, Canada
Tiago H. Falk
Tiago H. Falk
Professor, INRS-EMT, University of Quebec, FIEEE
multimodal/sensory signal processingaffective computingcognitive computingcontext-awareness