Formula-Supervised Sound Event Detection: Pre-Training Without Real Data

📅 2025-04-06

🏛️ IEEE International Conference on Acoustics, Speech, and Signal Processing

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

To address the scarcity of high-quality annotated data and severe label noise in sound event detection (SED), this paper proposes a formula-driven supervised pretraining framework. Leveraging acoustic principles—such as frequency-modulation (FM) and amplitude-modulation (AM) models—we parameterize synthetic acoustic signals and use their underlying generative parameters as intrinsic ground-truth labels, thereby constructing Formula-SED, a large-scale, fully synthetic SED dataset. This approach enables the first large-scale, end-to-end SED pretraining without human annotations or reliance on real-world recordings. Integrated with a Transformer architecture and cross-domain transfer, our method achieves a 4.2% absolute improvement in F1-score and 40% faster convergence on DCASE 2023 Task 4 (DESED). Results demonstrate that formula-based supervision effectively mitigates annotation bias while enhancing model generalization and training efficiency.

Technology Category

Application Category

📝 Abstract

In this paper, we propose a novel formula-driven supervised learning (FDSL) framework for pre-training an environmental sound analysis model by leveraging acoustic signals parametrically synthesized through formula-driven methods. Specifically, we outline detailed procedures and evaluate their effectiveness for sound event detection (SED). The SED task, which involves estimating the types and timings of sound events, is particularly challenged by the difficulty of acquiring a sufficient quantity of accurately labeled training data. Moreover, it is well known that manually annotated labels often contain noises and are significantly influenced by the subjective judgment of annotators. To address these challenges, we propose a novel pre-training method that utilizes a synthetic dataset, Formula-SED, where acoustic data are generated solely based on mathematical formulas. The proposed method enables large-scale pre-training by using the synthesis parameters applied at each time step as ground truth labels, thereby eliminating label noise and bias. We demonstrate that large-scale pre-training with Formula-SED significantly enhances model accuracy and accelerates training, as evidenced by our results in the DESED dataset used for DCASE2023 Challenge Task 4. The project page is at https://yutoshibata07.github.io/Formula-SED/

Problem

Research questions and friction points this paper is trying to address.

Pre-training sound event detection models without real data

Addressing label noise and bias in sound event detection

Enhancing model accuracy with synthetic formula-driven datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Formula-driven supervised learning for pre-training

Synthetic dataset eliminates label noise

Parametric synthesis with mathematical formulas

🔎 Similar Papers

No similar papers found.