ECOSoundSet: a finely annotated dataset for the automated acoustic identification of Orthoptera and Cicadidae in North, Central and temperate Western Europe

📅 2025-04-29
📈 Citations: 0
✹ Influential: 0
📄 PDF
đŸ€– AI Summary
Existing acoustic identification tools for European insects suffer from limited data scale and insufficient ecological representativeness, necessitating a high-quality, broad-coverage, finely annotated, and cross-regional acoustic dataset. To address this, we introduce EuroOrthoCicada—the first high-precision orthopteran and cicada acoustic dataset tailored to temperate Europe—comprising 10,653 field recordings spanning 223 taxonomic units (200 Orthoptera species and 24 Cicadidae species, including subspecies) across Northern, Central, and Western Europe. We propose a novel dual-labeling paradigm: weak labels (species presence) and strong labels (precise time-frequency localization of calls), jointly standardized by an international panel of taxonomic experts. The dataset is stratified into train/validation/test splits (8:1:1) for immediate model development. Empirical evaluation demonstrates that EuroOrthoCicada substantially improves deep learning models’ robustness to faint and overlapping insect vocalizations in complex natural soundscapes and enhances their cross-regional generalization capability.

Technology Category

Application Category

📝 Abstract
Currently available tools for the automated acoustic recognition of European insects in natural soundscapes are limited in scope. Large and ecologically heterogeneous acoustic datasets are currently needed for these algorithms to cross-contextually recognize the subtle and complex acoustic signatures produced by each species, thus making the availability of such datasets a key requisite for their development. Here we present ECOSoundSet (European Cicadidae and Orthoptera Sound dataSet), a dataset containing 10,653 recordings of 200 orthopteran and 24 cicada species (217 and 26 respective taxa when including subspecies) present in North, Central, and temperate Western Europe (Andorra, Belgium, Denmark, mainland France and Corsica, Germany, Ireland, Luxembourg, Monaco, Netherlands, United Kingdom, Switzerland), collected partly through targeted fieldwork in South France and Catalonia and partly through contributions from various European entomologists. The dataset is composed of a combination of coarsely labeled recordings, for which we can only infer the presence, at some point, of their target species (weak labeling), and finely annotated recordings, for which we know the specific time and frequency range of each insect sound present in the recording (strong labeling). We also provide a train/validation/test split of the strongly labeled recordings, with respective approximate proportions of 0.8, 0.1 and 0.1, in order to facilitate their incorporation in the training and evaluation of deep learning algorithms. This dataset could serve as a meaningful complement to recordings already available online for the training of deep learning algorithms for the acoustic classification of orthopterans and cicadas in North, Central, and temperate Western Europe.
Problem

Research questions and friction points this paper is trying to address.

Limited automated acoustic recognition tools for European insects
Need for large, ecologically diverse datasets for species identification
Lack of finely annotated insect sound recordings for deep learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Finely annotated dataset for insect acoustic identification
Combination of weak and strong labeled recordings
Train/validation/test split for deep learning algorithms
🔎 Similar Papers
No similar papers found.
D
David Funosas
Station d’Écologie ThĂ©orique et ExpĂ©rimentale (SETE, CNRS), Moulis, France; UniversitĂ© Paul Sabatier - Toulouse III, UPS, Toulouse, France; Centre de Recherche sur la BiodiversitĂ© et l’Environnement - UMR 5300 CNRS-INPT-IRD-UT, Toulouse, France
E
Elodie Massol
UniversitĂ© Paul Sabatier - Toulouse III, UPS, Toulouse, France; Centre de Recherche sur la BiodiversitĂ© et l’Environnement - UMR 5300 CNRS-INPT-IRD-UT, Toulouse, France
Y
Yves Bas
Centre d’Ecologie et des Sciences de la Conservation (CESCO, MNHN), Centre National de la Recherche Scientifique, Sorbonne UniversitĂ©, Paris, France; Dynafor, INRAE-INPT, University of Toulouse, Castanet-Tolosan, France
S
Svenja Schmidt
University of Freiburg, Faculty of Biology, Geobotany, Schaenzlestr. 1, D-79104 Freiburg, Germany
D
Dominik Arend
University of Freiburg, Faculty of Biology, Geobotany, Schaenzlestr. 1, D-79104 Freiburg, Germany
A
Alexander Gebhard
CHI – Chair of Health Informatics, MRI, Technical University of Munich, Germany
L
Luc Barbaro
Dynafor, INRAE-INPT, University of Toulouse, Castanet-Tolosan, France
S
Sebastian Konig
CHI – Chair of Health Informatics, MRI, Technical University of Munich, Germany
R
Rafael Carbonell Font
InstituciĂł Catalana d’HistĂČria Natural (ICHN), Barcelona, Spain
D
David Sannier
F
Fernand Deroussen
Nashvert Naturophonia, Val Maravel, France
J
J'erome Sueur
Institut de Systématique, Evolution, Biodiversité (ISYEB), Muséum national d'Histoire Naturelle (MNHN), CNRS, Sorbonne Université, Ecole Pratique des Hautes Etudes - PSL, Université des Antilles, Paris, France
C
Christian Roesti
Orthoptera.ch, Bern, Switzerland
T
T. Trilar
Slovenian Museum of Natural History (PMSL), Ljubljana, Slovenia
W
W. Forstmeier
Department of Ornithology, Max Planck Institute for Biological Intelligence, Seewiesen, Germany
L
Lucas Roger
Centre d’Ecologie et des Sciences de la Conservation (CESCO, MNHN), Centre National de la Recherche Scientifique, Sorbonne UniversitĂ©, Paris, France; PatriNat (OFB, MNHN), 75005 Paris, France
E
E. Matheu
Piotr Guzik
Piotr Guzik
Jagiellonian University
Planetary AstrophysicsMachine LearningDigital Watermarking ...
J
Julien Barataud
Institut de Systématique, Evolution, Biodiversité (ISYEB), Muséum national d'Histoire Naturelle (MNHN), CNRS, Sorbonne Université, Ecole Pratique des Hautes Etudes - PSL, Université des Antilles, Paris, France
L
L. Pelozuelo
S
Stéphane Puissant
Nashvert Naturophonia, Val Maravel, France
S
Sandra Mueller
University of Freiburg, Faculty of Biology, Geobotany, Schaenzlestr. 1, D-79104 Freiburg, Germany
B
Bjorn W. Schuller
University of Freiburg, Faculty of Biology, Geobotany, Schaenzlestr. 1, D-79104 Freiburg, Germany
J
Jose M. Montoya
Station d’Écologie ThĂ©orique et ExpĂ©rimentale (SETE, CNRS), Moulis, France
Andreas Triantafyllopoulos
Andreas Triantafyllopoulos
Technical University of Munich
machine learningaffective computingcomputer audition
M
M. Cauchoix
Station d’Écologie ThĂ©orique et ExpĂ©rimentale (SETE, CNRS), Moulis, France