🤖 AI Summary
Traditional audio denoising methods rely on fixed definitions of target and noise signals, limiting their adaptability to diverse auditory contexts and thus constraining performance. This work proposes Automatic Context-Aware Denoising (ACAD), a novel approach that introduces context-awareness into audio denoising for the first time. ACAD employs an end-to-end deep learning model to jointly perform acoustic scene classification and context-adaptive noise suppression, dynamically distinguishing in-context (IC) informative events from out-of-context (OC) noise and selectively attenuating the latter. Experimental results on a multi-scenario paired dataset demonstrate that ACAD significantly outperforms baseline methods—including those without context awareness, those using manually provided context, and those incorporating non-informative context—across standard objective evaluation metrics.
📝 Abstract
Audio context determines which sound components and sources are relevant and which can be perceived as irrelevant (noise) by listeners. For example, traffic noise is informative in urban surveillance but noise for a phone call at the same location. Most current audio denoising systems apply fixed target-noise definitions, often removing useful components in one context while failing to suppress irrelevant components. To address this, we introduce the concept automatic contextual audio denoising (ACAD) which defines target and noise based on the inferred context. In this work, we restrict context to be associated with an acoustic scene class. We label sound events outside the event distribution of a scene class (noise) as out-of-context (OC) and events typical for that scene as in-context (IC). We implement a deep learning method that automatically infers the context of the audio signal and removes OC components, and benchmark it against variants: without context inference, with oracle context, and with separately provided uninformative context. On paired clean/noisy data across diverse contexts, where OC components in one context may be IC in another, our proposed method outperforms other approaches across standard objective metrics, indicating that the model can infer context and context-dependent processing can enhance denoising.