InsectSet459: an open dataset of insect sounds for bioacoustic machine learning

📅 2025-03-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of insect sound identification and data scarcity impeding biodiversity monitoring, this study introduces InsectSound—the first large-scale, open acoustic dataset for deep learning, comprising 26,399 field-recorded audio clips from 459 orthopteran and cicadid species, spanning a broad frequency range (0.1–96 kHz) and multiple sampling rates (8–96 kHz). We propose a standardized recording protocol and a spectrogram-based augmentation pipeline, establishing the first acoustic benchmark featuring multi-species coverage, high-frequency variability, and cross-device robustness. Benchmarking with ResNet-50 and EfficientNet-B0 demonstrates strong fine-grained classification performance. Crucially, we systematically reveal—for the first time—the critical impact of high-frequency spectral variation and sampling-rate inconsistency on model robustness. This work provides both a foundational dataset and a novel methodological framework for cross-domain audio representation learning and automated bioacoustic monitoring.

Technology Category

Application Category

📝 Abstract
Automatic recognition of insect sound could help us understand changing biodiversity trends around the world -- but insect sounds are challenging to recognize even for deep learning. We present a new dataset comprised of 26399 audio files, from 459 species of Orthoptera and Cicadidae. It is the first large-scale dataset of insect sound that is easily applicable for developing novel deep-learning methods. Its recordings were made with a variety of audio recorders using varying sample rates to capture the extremely broad range of frequencies that insects produce. We benchmark performance with two state-of-the-art deep learning classifiers, demonstrating good performance but also significant room for improvement in acoustic insect classification. This dataset can serve as a realistic test case for implementing insect monitoring workflows, and as a challenging basis for the development of audio representation methods that can handle highly variable frequencies and/or sample rates.
Problem

Research questions and friction points this paper is trying to address.

Develop deep-learning methods for insect sound recognition.
Create a large-scale dataset for bioacoustic machine learning.
Improve classification of highly variable insect sound frequencies.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale insect sound dataset for deep learning
Varied audio recorders capture broad frequency ranges
Benchmarked with state-of-the-art deep learning classifiers
🔎 Similar Papers
No similar papers found.