The iNaturalist Sounds Dataset

📅 2025-05-31

🏛️ Neural Information Processing Systems

📈 Citations: 2

✨ Influential: 1

career value

212K/year

🤖 AI Summary

To address the scarcity of bioacoustic data, high annotation costs, and insufficient cross-taxa coverage, this study introduces iNatSound—the first large-scale, multi-taxon (birds, mammals, insects, etc.), weakly supervised global bioacoustic dataset, comprising 230,000 audio recordings from over 5,500 species, sourced from iNaturalist citizen science observations. Innovatively integrating field-collected weakly labeled audio, iNatSound supports both single-species classification and multi-label learning. A rigorous cross-dataset evaluation protocol is designed to validate its utility as a pretraining resource for downstream strongly labeled tasks. Leveraging contrastive learning with multiple backbone architectures (e.g., ResNet, EfficientNet), models pretrained on iNatSound achieve significant performance gains across multiple acoustic recognition benchmarks. The dataset is publicly released, establishing a foundational resource for ecological AI and participatory biodiversity monitoring.

Technology Category

Application Category

📝 Abstract

We present the iNaturalist Sounds Dataset (iNatSounds), a collection of 230,000 audio files capturing sounds from over 5,500 species, contributed by more than 27,000 recordists worldwide. The dataset encompasses sounds from birds, mammals, insects, reptiles, and amphibians, with audio and species labels derived from observations submitted to iNaturalist, a global citizen science platform. Each recording in the dataset varies in length and includes a single species annotation. We benchmark multiple backbone architectures, comparing multiclass classification objectives with multilabel objectives. Despite weak labeling, we demonstrate that iNatSounds serves as a useful pretraining resource by benchmarking it on strongly labeled downstream evaluation datasets. The dataset is available as a single, freely accessible archive, promoting accessibility and research in this important domain. We envision models trained on this data powering next-generation public engagement applications, and assisting biologists, ecologists, and land use managers in processing large audio collections, thereby contributing to the understanding of species compositions in diverse soundscapes.

Problem

Research questions and friction points this paper is trying to address.

Classifying diverse species sounds from a large audio dataset

Comparing multiclass and multilabel classification methods for audio

Enabling biodiversity research via accessible sound data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale audio dataset with species annotations

Benchmarking multiclass vs multilabel classification

Pretraining resource for downstream audio tasks

🔎 Similar Papers

No similar papers found.