🤖 AI Summary
Real-time, energy-efficient, and accurate multi-species avian acoustic monitoring on low-power edge devices remains a critical challenge. This paper introduces the first practical edge-acoustic monitoring framework for birds: a semi-learnable spectro-temporal feature extractor—designed specifically to capture the time-frequency characteristics of bird vocalizations—reduces computational overhead significantly; combined with a lightweight WrenNet model, it is deployed on resource-constrained platforms including AudioMoth and Raspberry Pi. Evaluated on a 70-species dataset, the system achieves 90.8% classification accuracy; on AudioMoth, single-inference energy consumption is only 77 mJ—over 16× more energy-efficient than BirdNet. To our knowledge, this is the first work enabling real-time, microcontroller-level multi-species audio classification. It establishes a scalable, edge-intelligent solution for large-scale, long-term, low-cost biodiversity monitoring.
📝 Abstract
This paper introduces WrenNet, an efficient neural network enabling real-time multi-species bird audio classification on low-power microcontrollers for scalable biodiversity monitoring. We propose a semi-learnable spectral feature extractor that adapts to avian vocalizations, outperforming standard mel-scale and fully-learnable alternatives. On an expert-curated 70-species dataset, WrenNet achieves up to 90.8% accuracy on acoustically distinctive species and 70.1% on the full task. When deployed on an AudioMoth device ($leq$1MB RAM), it consumes only 77mJ per inference. Moreover, the proposed model is over 16x more energy-efficient compared to Birdnet when running on a Raspberry Pi 3B+. This work demonstrates the first practical framework for continuous, multi-species acoustic monitoring on low-power edge devices.