🤖 AI Summary
To address the challenges of insect sound identification and data scarcity impeding biodiversity monitoring, this study introduces InsectSound—the first large-scale, open acoustic dataset for deep learning, comprising 26,399 field-recorded audio clips from 459 orthopteran and cicadid species, spanning a broad frequency range (0.1–96 kHz) and multiple sampling rates (8–96 kHz). We propose a standardized recording protocol and a spectrogram-based augmentation pipeline, establishing the first acoustic benchmark featuring multi-species coverage, high-frequency variability, and cross-device robustness. Benchmarking with ResNet-50 and EfficientNet-B0 demonstrates strong fine-grained classification performance. Crucially, we systematically reveal—for the first time—the critical impact of high-frequency spectral variation and sampling-rate inconsistency on model robustness. This work provides both a foundational dataset and a novel methodological framework for cross-domain audio representation learning and automated bioacoustic monitoring.
📝 Abstract
Automatic recognition of insect sound could help us understand changing biodiversity trends around the world -- but insect sounds are challenging to recognize even for deep learning. We present a new dataset comprised of 26399 audio files, from 459 species of Orthoptera and Cicadidae. It is the first large-scale dataset of insect sound that is easily applicable for developing novel deep-learning methods. Its recordings were made with a variety of audio recorders using varying sample rates to capture the extremely broad range of frequencies that insects produce. We benchmark performance with two state-of-the-art deep learning classifiers, demonstrating good performance but also significant room for improvement in acoustic insect classification. This dataset can serve as a realistic test case for implementing insect monitoring workflows, and as a challenging basis for the development of audio representation methods that can handle highly variable frequencies and/or sample rates.