Improving Audio Event Recognition with Consistency Regularization

📅 2025-09-12

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This work addresses the challenges of scarce labeled data and limited model generalization in audio event recognition. To this end, it introduces consistency regularization to this task for the first time, proposing a unified framework that integrates multi-augmentation strategies with semi-supervised learning. Methodologically, the approach generates diverse input views via strong data augmentation and jointly optimizes prediction consistency across both fully supervised and few-shot semi-supervised settings. A novel multi-augmentation composition mechanism is designed to enhance view diversity and robustness. Experiments on the AudioSet dataset—under a realistic scale disparity (20K labeled vs. 1.8M unlabeled samples)—demonstrate that the method significantly outperforms strong baselines. In the few-shot semi-supervised setting, it achieves substantial performance gains over the best model trained solely on limited labeled data. These results validate the effectiveness and scalability of consistency regularization for audio event recognition.

Technology Category

Application Category

📝 Abstract

Consistency regularization (CR), which enforces agreement between model predictions on augmented views, has found recent benefits in automatic speech recognition [1]. In this paper, we propose the use of consistency regularization for audio event recognition, and demonstrate its effectiveness on AudioSet. With extensive ablation studies for both small ($sim$20k) and large ($sim$1.8M) supervised training sets, we show that CR brings consistent improvement over supervised baselines which already heavily utilize data augmentation, and CR using stronger augmentation and multiple augmentations leads to additional gain for the small training set. Furthermore, we extend the use of CR into the semi-supervised setup with 20K labeled samples and 1.8M unlabeled samples, and obtain performance improvement over our best model trained on the small set.

Problem

Research questions and friction points this paper is trying to address.

Applying consistency regularization to audio event recognition

Evaluating CR effectiveness on AudioSet with supervised training

Extending CR to semi-supervised learning with unlabeled data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Consistency regularization for audio event recognition

Stronger and multiple augmentations for small datasets

Semi-supervised setup with labeled and unlabeled samples

🔎 Similar Papers

No similar papers found.