Schrödinger Bridge Mamba for One-Step Speech Enhancement

📅 2025-10-19

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

To address the poor real-time performance caused by multi-step inference in generative speech enhancement, this paper proposes Schrödinger Bridge Mamba (SBM)—the first framework to deeply integrate the Schrödinger Bridge probabilistic modeling paradigm with the selective state space model Mamba, enabling an end-to-end, single-step generative architecture. Leveraging their intrinsic compatibility in temporal modeling and latent variable evolution, SBM achieves efficient, fully differentiable one-step inference. Evaluated on four standard benchmarks, SBM jointly performs denoising and dereverberation, consistently outperforming multi-step diffusion and autoregressive baselines in both objective metrics and perceptual quality, while achieving the lowest real-time factor (RTF). This work establishes a novel paradigm for low-latency, high-fidelity speech enhancement.

Technology Category

Application Category

📝 Abstract

We propose Schrödinger Bridge Mamba (SBM), a new concept of training-inference framework motivated by the inherent compatibility between Schrödinger Bridge (SB) training paradigm and selective state-space model Mamba. We exemplify the concept of SBM with an implementation for generative speech enhancement. Experiments on a joint denoising and dereverberation task using four benchmark datasets demonstrate that SBM, with only 1-step inference, outperforms strong baselines with 1-step or iterative inference and achieves the best real-time factor (RTF). Beyond speech enhancement, we discuss the integration of SB paradigm and selective state-space model architecture based on their underlying alignment, which indicates a promising direction for exploring new deep generative models potentially applicable to a broad range of generative tasks. Demo page: https://sbmse.github.io

Problem

Research questions and friction points this paper is trying to address.

Proposes a one-step speech enhancement framework combining Schrödinger Bridge and Mamba

Achieves superior denoising and dereverberation performance with single-step inference

Explores integration of selective state-space models for broader generative applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Schrödinger Bridge Mamba combines SB training with Mamba

One-step inference outperforms multi-step baselines

Integration enables real-time generative speech enhancement

🔎 Similar Papers

An Investigation of Incorporating Mamba For Speech Enhancement