Schrödinger Bridge Mamba for One-Step Speech Enhancement

📅 2025-10-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the poor real-time performance caused by multi-step inference in generative speech enhancement, this paper proposes Schrödinger Bridge Mamba (SBM)—the first framework to deeply integrate the Schrödinger Bridge probabilistic modeling paradigm with the selective state space model Mamba, enabling an end-to-end, single-step generative architecture. Leveraging their intrinsic compatibility in temporal modeling and latent variable evolution, SBM achieves efficient, fully differentiable one-step inference. Evaluated on four standard benchmarks, SBM jointly performs denoising and dereverberation, consistently outperforming multi-step diffusion and autoregressive baselines in both objective metrics and perceptual quality, while achieving the lowest real-time factor (RTF). This work establishes a novel paradigm for low-latency, high-fidelity speech enhancement.

Technology Category

Application Category

📝 Abstract
We propose Schrödinger Bridge Mamba (SBM), a new concept of training-inference framework motivated by the inherent compatibility between Schrödinger Bridge (SB) training paradigm and selective state-space model Mamba. We exemplify the concept of SBM with an implementation for generative speech enhancement. Experiments on a joint denoising and dereverberation task using four benchmark datasets demonstrate that SBM, with only 1-step inference, outperforms strong baselines with 1-step or iterative inference and achieves the best real-time factor (RTF). Beyond speech enhancement, we discuss the integration of SB paradigm and selective state-space model architecture based on their underlying alignment, which indicates a promising direction for exploring new deep generative models potentially applicable to a broad range of generative tasks. Demo page: https://sbmse.github.io
Problem

Research questions and friction points this paper is trying to address.

Proposes a one-step speech enhancement framework combining Schrödinger Bridge and Mamba
Achieves superior denoising and dereverberation performance with single-step inference
Explores integration of selective state-space models for broader generative applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Schrödinger Bridge Mamba combines SB training with Mamba
One-step inference outperforms multi-step baselines
Integration enables real-time generative speech enhancement
🔎 Similar Papers
No similar papers found.
J
Jing Yang
Central Media Technology Institute, Huawei
Sirui Wang
Sirui Wang
Meituan
NLPLLM
C
Chao Wu
Central Media Technology Institute, Huawei
F
Fan Fan
Central Media Technology Institute, Huawei