Hierarchical Label Propagation: A Model-Size-Dependent Performance Booster for AudioSet Tagging

📅 2025-03-26

🏛️ IEEE International Conference on Acoustics, Speech, and Signal Processing

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

AudioSet suffers from ontology-driven label inconsistency: audio events that should be positive instances are frequently mislabeled as negative, resulting in systematic under-annotation. To address this, we propose Hierarchical Label Propagation (HLP), the first method to explicitly incorporate the audio event ontology structure into label correction—deterministically propagating positive labels upward along the ontology hierarchy. HLP is architecture-agnostic, seamlessly integrating with mainstream models including CNNs (CNN6, ConvNeXT) and Transformers (PaSST), without requiring model retraining. Experiments demonstrate that HLP increases positive label density from 1.98 to 2.39 per clip, covering 109 classes. It consistently improves mean Average Precision (mAP) on both AudioSet and FSD50K, with more pronounced gains for smaller models (e.g., +1.2% mAP for CNN6), revealing a synergistic interaction between data quality enhancement and model capacity.

Technology Category

Application Category

📝 Abstract

AudioSet is one of the most used and largest datasets in audio tagging, containing about 2 million audio samples that are manually labeled with 527 event categories organized into an ontology. However, the annotations contain inconsistencies, particularly where categories that should be labeled as positive according to the ontology are frequently mislabeled as negative. To address this issue, we apply Hierarchical Label Propagation (HLP), which propagates labels up the ontology hierarchy, resulting in a mean increase in positive labels per audio clip from 1.98 to 2.39 and affecting 109 out of the 527 classes. Our results demonstrate that HLP provides performance benefits across various model architectures, including convolutional neural networks (PANN's CNN6 and ConvNeXT) and transformers (PaSST), with smaller models showing more improvements. Finally, on FSD50K, another widely used dataset, models trained on AudioSet with HLP consistently outperformed those trained without HLP. Our source code will be made available on GitHub.

Problem

Research questions and friction points this paper is trying to address.

Address inconsistent annotations in AudioSet audio tagging

Propagate labels hierarchically to correct mislabeled categories

Improve model performance across architectures with HLP

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Label Propagation boosts performance

Propagates labels up ontology hierarchy

Improves small model performance significantly

🔎 Similar Papers

No similar papers found.