Merlin L48 Spectrogram Dataset

📅 2025-10-31

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Synthetic single-positive multi-label (SPML) datasets—typically generated via random sampling on Pascal VOC or COCO—fail to reflect intrinsic complexities (e.g., inter-class acoustic similarity, background clutter) present in real-world ecological audio, leading to overfitting and poor generalization. Method: We introduce L48, the first fine-grained, real-world SPML benchmark for bird sound classification, comprising 48 expert-annotated avian species with label distributions derived from domain-specific priors to model positive/negative sample structure. Contribution/Results: Unlike synthetic counterparts, L48 captures ecological realism and exposes critical weaknesses in state-of-the-art SPML methods, which underperform substantially relative to fully supervised baselines—confirming overfitting to artificial data. This work pioneers SPML research in realistic weakly supervised audio settings, establishing L48 as a rigorous testbed and enabling a new evaluation paradigm for developing robust weakly supervised learning algorithms.

Technology Category

Application Category

📝 Abstract

In the single-positive multi-label (SPML) setting, each image in a dataset is labeled with the presence of a single class, while the true presence of other classes remains unknown. The challenge is to narrow the performance gap between this partially-labeled setting and fully-supervised learning, which often requires a significant annotation budget. Prior SPML methods were developed and benchmarked on synthetic datasets created by randomly sampling single positive labels from fully-annotated datasets like Pascal VOC, COCO, NUS-WIDE, and CUB200. However, this synthetic approach does not reflect real-world scenarios and fails to capture the fine-grained complexities that can lead to difficult misclassifications. In this work, we introduce the L48 dataset, a fine-grained, real-world multi-label dataset derived from recordings of bird sounds. L48 provides a natural SPML setting with single-positive annotations on a challenging, fine-grained domain, as well as two extended settings in which domain priors give access to additional negative labels. We benchmark existing SPML methods on L48 and observe significant performance differences compared to synthetic datasets and analyze method weaknesses, underscoring the need for more realistic and difficult benchmarks.

Problem

Research questions and friction points this paper is trying to address.

Narrowing performance gap between single-positive and fully-supervised multi-label learning

Addressing limitations of synthetic SPML datasets in real-world scenarios

Providing realistic fine-grained benchmark for bird sound classification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces L48 dataset from bird sound recordings

Provides natural single-positive multi-label annotations

Benchmarks existing methods on fine-grained domain

🔎 Similar Papers

Comparison Performance of Spectrogram and Scalogram as Input of Acoustic Recognition Task