Leveraging LLM and Text-Queried Separation for Noise-Robust Sound Event Detection

📅 2024-11-02

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

To address the significant performance degradation of sound event detection (SED) under unknown noise interference, this paper proposes a novel LLM-driven, text-queryable, and noise-robust SED paradigm. Methodologically, it introduces large language models (LLMs) for the first time to jointly perform noise-type identification and adaptive noise augmentation, synergistically integrated with language-guided audio source separation (LASS) to enable semantic-aware overlapping sound source separation and event localization. Additionally, it devises a clip-wise text query generation mechanism and a noise-aware data augmentation strategy. Evaluated on standard SED benchmarks—including URBAN-SED and DCASE2023—the method achieves an average F1-score improvement of 12.6% under unknown noise conditions, substantially outperforming conventional LASS-based and fully supervised SED approaches. The code, pre-trained models, and text-query dataset are publicly released.

Technology Category

Application Category

📝 Abstract

Sound Event Detection (SED) is challenging in noisy environments where overlapping sounds obscure target events. Language-queried audio source separation (LASS) aims to isolate the target sound events from a noisy clip. However, this approach can fail when the exact target sound is unknown, particularly in noisy test sets, leading to reduced performance. To address this issue, we leverage the capabilities of large language models (LLMs) to analyze and summarize acoustic data. By using LLMs to identify and select specific noise types, we implement a noise augmentation method for noise-robust fine-tuning. The fine-tuned model is applied to predict clip-wise event predictions as text queries for the LASS model. Our studies demonstrate that the proposed method improves SED performance in noisy environments. This work represents an early application of LLMs in noise-robust SED and suggests a promising direction for handling overlapping events in SED. Codes and pretrained models are available at https://github.com/apple-yinhan/Noise-robust-SED.

Problem

Research questions and friction points this paper is trying to address.

Sound Event Detection

Noisy Environment

Traditional LASS Method

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models

Sound Event Detection

Noisy Environment

🔎 Similar Papers

Exploring Text-Queried Sound Event Detection with Audio Source Separation