CoarseSoundNet: Building a reliable model for ecological soundscape analysis

📅 2026-05-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

192K/year
🤖 AI Summary
Existing automated tools struggle to reliably distinguish biological, natural abiotic, and anthropogenic sounds under real-world passive acoustic monitoring (PAM) conditions. To address this challenge, this work proposes CoarseSoundNet, a deep learning model that introduces a silence class, class-specific thresholds, and duration constraints to establish a reproducible coarse-grained soundscape classification framework. The study systematically investigates model architectures, multi-source data composition, and evaluation strategies, substantially enhancing the model’s generalization and robustness in noisy environments. Experimental results demonstrate that the proposed method achieves strong performance on real PAM data, with acoustic indices derived from pre-filtered recordings showing high concordance with manual annotations, thereby effectively supporting ecological acoustic analysis.
📝 Abstract
A soundscape is composed of three types of sound: biophony (sounds made by animals), geophony (natural abiotic sounds) and anthropophony (sounds made by humans). A key research question in the field of soundscape ecology is how these components interact with each other, specifically how biophony responds to geophony and anthropophony. Nevertheless, as of today, there are not many analytical instruments that enable the distinct quantification of these elements. Recent machine learning (ML) approaches aim to support automated analysis but often rely on task-specific or clean data, limiting generalisation to noisy passive acoustic monitoring (PAM) recordings. This study presents a clear and reproducible structure to build ML models for coarse soundscape classification and introduces CoarseSoundNet, a deep learning model trained to distinguish biophony, geophony, and anthropophony under realistic PAM conditions. We systematically investigate model architectures, the influence of an additional training class, data composition, and evaluation strategies. Our findings suggest that model performance improves with additional PAM data, especially when similar to the target domain, and by introducing an explicit silence class during training. Class-specific decision thresholds and duration-based constraints further enhance performance, particularly for anthropophony and geophony. Error analyses exhibit challenges for anthropophony due to masking effects and confusions for silence and insect sounds for geophony and biophony. Finally, we conduct an ecological case study which shows that pre-filtering recordings with CoarseSoundNet yields acoustic index trends comparable to ground-truth filtering, supporting its use as an effective preprocessing tool for ecoacoustic analyses.
Problem

Research questions and friction points this paper is trying to address.

soundscape ecology
biophony
geophony
anthropophony
passive acoustic monitoring
Innovation

Methods, ideas, or system contributions that make the work stand out.

CoarseSoundNet
soundscape classification
passive acoustic monitoring
deep learning
ecological audio analysis
A
Alexander Gebhard
TUM University Hospital, CHI – Chair of Health Informatics, Ismaninger Str. 22, Munich, 81675, Bavaria, Germany; MCML – Munich Center for Machine Learning, Munich, Bavaria, Germany; Imperial College London, GLAM – Group on Language, Audio, & Music, London, UK
Andreas Triantafyllopoulos
Andreas Triantafyllopoulos
Technical University of Munich
machine learningaffective computingcomputer audition
D
Dominik Arend
University of Freiburg, Faculty of Biology, Geobotany, Schaenzlestr. 1, Freiburg, 79104, Baden-Württemberg, Germany
S
Sandra Müller
University of Freiburg, Faculty of Biology, Geobotany, Schaenzlestr. 1, Freiburg, 79104, Baden-Württemberg, Germany
S
Svenja Schmidt
University of Freiburg, Faculty of Biology, Geobotany, Schaenzlestr. 1, Freiburg, 79104, Baden-Württemberg, Germany
M
Michael Scherer-Lorenzen
University of Freiburg, Faculty of Biology, Geobotany, Schaenzlestr. 1, Freiburg, 79104, Baden-Württemberg, Germany
B
Björn W. Schuller
TUM University Hospital, CHI – Chair of Health Informatics, Ismaninger Str. 22, Munich, 81675, Bavaria, Germany; MCML – Munich Center for Machine Learning, Munich, Bavaria, Germany; Imperial College London, GLAM – Group on Language, Audio, & Music, London, UK