Omni-directional attention mechanism based on Mamba for speech separation

📅 2026-01-23

📈 Citations: 1

✨ Influential: 0

career value

217K/year

🤖 AI Summary

This work addresses the limitation of existing Mamba-based speech separation approaches, which model spectrograms along only a single dimension and thus struggle to capture two-dimensional global dependencies. To overcome this, the study introduces omnidirectional attention into the Mamba architecture for the first time, enabling efficient modeling of global time-frequency dependencies by traversing the spectrogram in ten distinct directions. The proposed method integrates selective state space models with omnidirectional attention to construct an end-to-end time-frequency domain speech separation system. Evaluated on three public datasets, it consistently outperforms current baselines and state-of-the-art methods, demonstrating both its effectiveness and scalability while maintaining linear computational complexity.

Technology Category

Application Category

📝 Abstract

Mamba, a selective state-space model (SSM), has emerged as an efficient alternative to Transformers for speech modeling, enabling long-sequence processing with linear complexity. While effective in speech separation, existing approaches, whether in the time or time-frequency domain, typically decompose the input along a single dimension into short one-dimensional sequences before processing them with Mamba, which restricts it to local 1D modeling and limits its ability to capture global dependencies across the 2D spectrogram. In this work, we propose an efficient omni-directional attention (OA) mechanism built upon unidirectional Mamba, which models global dependencies from ten different directions on the spectrogram. We expand the proposed mechanism into two baseline separation models and evaluate on three public datasets. Experimental results show that our approach consistently achieves significant performance gains over the baselines while preserving linear complexity, outperforming existing state-of-the-art (SOTA) systems.

Problem

Research questions and friction points this paper is trying to address.

speech separation

Mamba

global dependencies

spectrogram

state-space model

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mamba

omni-directional attention

speech separation