Unsupervised Multi-channel Speech Dereverberation via Diffusion

📅 2025-08-04

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This paper addresses blind multichannel single-speaker dereverberation—recovering anechoic direct-path speech solely from reverberant multichannel mixtures. We propose USD-DPS, a framework that leverages an unconditional clean-speech diffusion model as a prior and performs end-to-end dereverberation via posterior sampling under multichannel mixture consistency constraints. Crucially, we introduce joint subband room impulse response (RIR) modeling and forward convolutional prediction (FCP) to estimate per-channel RIRs without requiring supervised RIR or clean-speech labels. Subband RIR parameters are efficiently optimized via Adam, enabling reference-free, diffusion-guided inference. Under fully unsupervised settings, USD-DPS significantly outperforms existing unsupervised dereverberation methods, achieving consistent improvements in both objective metrics (e.g., STOI, PESQ) and subjective quality. Audio examples confirm high-fidelity speech reconstruction. The code and interactive demo are publicly available.

Technology Category

Application Category

📝 Abstract

We consider the problem of multi-channel single-speaker blind dereverberation, where multi-channel mixtures are used to recover the clean anechoic speech. To solve this problem, we propose USD-DPS, {U}nsupervised {S}peech {D}ereverberation via {D}iffusion {P}osterior {S}ampling. USD-DPS uses an unconditional clean speech diffusion model as a strong prior to solve the problem by posterior sampling. At each diffusion sampling step, we estimate all microphone channels' room impulse responses (RIRs), which are further used to enforce a multi-channel mixture consistency constraint for diffusion guidance. For multi-channel RIR estimation, we estimate reference-channel RIR by optimizing RIR parameters of a sub-band RIR signal model, with the Adam optimizer. We estimate non-reference channels' RIRs analytically using forward convolutive prediction (FCP). We found that this combination provides a good balance between sampling efficiency and RIR prior modeling, which shows superior performance among unsupervised dereverberation approaches. An audio demo page is provided in https://usddps.github.io/USDDPS_demo/.

Problem

Research questions and friction points this paper is trying to address.

Unsupervised multi-channel speech dereverberation using diffusion models

Estimating room impulse responses for clean speech recovery

Combining RIR optimization with multi-channel consistency constraints

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unsupervised speech dereverberation via diffusion

Multi-channel mixture consistency constraint guidance

Sub-band RIR signal model with Adam optimizer

🔎 Similar Papers

Unsupervised Blind Joint Dereverberation and Room Acoustics Estimation with Diffusion Models