Generative AI-based data augmentation for improved bioacoustic classification in noisy environments

📅 2024-12-02

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

177K/year

🤖 AI Summary

In high-noise environments such as wind farms, annotated acoustic data for rare bird species are scarce, and conventional image augmentation techniques are ill-suited for spectrograms due to their structural and temporal dependencies. Method: We propose a generative AI–based spectrogram synthesis framework, systematically comparing Auxiliary Conditional GANs (ACGAN) and Denoising Diffusion Probabilistic Models (DDPM) for bioacoustic spectrogram generation. We curate the first annotated 640-hour avian vocalization dataset from Irish wind farms and train an ensemble classifier integrating DDPM-synthesized and real spectrograms. Contribution/Results: DDPM significantly outperforms ACGAN and traditional augmentation in spectrogram fidelity and downstream classification accuracy. Under the BirdNET high-confidence prediction benchmark, our ensemble model achieves 92.6% accuracy—improving upon the real-data-only baseline by 2.1 percentage points. Both code and the new dataset are publicly released.

Technology Category

Application Category

📝 Abstract

1. Obtaining data to train robust artificial intelligence (AI)-based models for species classification can be challenging, particularly for rare species. Data augmentation can boost classification accuracy by increasing the diversity of training data and is cheaper to obtain than expert-labelled data. However, many classic image-based augmentation techniques are not suitable for audio spectrograms. 2. We investigate two generative AI models as data augmentation tools to synthesise spectrograms and supplement audio data: Auxiliary Classifier Generative Adversarial Networks (ACGAN) and Denoising Diffusion Probabilistic Models (DDPMs). The latter performed particularly well in terms of both realism of generated spectrograms and accuracy in a resulting classification task. 3. Alongside these new approaches, we present a new audio data set of 640 hours of bird calls from wind farm sites in Ireland, approximately 800 samples of which have been labelled by experts. Wind farm data are particularly challenging for classification models given the background wind and turbine noise. 4. Training an ensemble of classification models on real and synthetic data combined gave 92.6% accuracy (and 90.5% with just the real data) when compared with highly confident BirdNET predictions. 5. Our approach can be used to augment acoustic signals for more species and other land-use types, and has the potential to bring about a step-change in our capacity to develop reliable AI-based detection of rare species. Our code is available at https://github.com/gibbona1/ SpectrogramGenAI.

Problem

Research questions and friction points this paper is trying to address.

Improving bioacoustic classification in noisy environments using generative AI

Addressing data scarcity for rare species in AI-based classification models

Enhancing classification accuracy with synthetic spectrograms from generative models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative AI models for spectrogram synthesis

ACGAN and DDPMs enhance audio data diversity

Ensemble training with real and synthetic data

🔎 Similar Papers

No similar papers found.