🤖 AI Summary
Speech separation faces high memory consumption and latency in long-sequence modeling due to the quadratic complexity of standard Transformers. To address this, we propose Focused Linear Attention (FLA), a linear-complexity attention mechanism integrated into a novel Transformer architecture—FLA-SepReformer—which combines gated mechanisms with focused linear attention to preserve global modeling capability while achieving substantial computational efficiency gains. Our design unifies the block-wise modeling strategy of SepReformer and the local-global synergy principle of TF-Locaformer. Experiments on multiple benchmark datasets demonstrate state-of-the-art (SOTA) separation performance. Moreover, FLA-SepReformer achieves 1.49–2.29× faster inference speed and reduces GPU memory usage by up to 68.1% (with a minimum reduction to 31.9% of the original memory footprint), significantly advancing the practical deployment of long-duration speech separation systems.
📝 Abstract
Speech separation always faces the challenge of handling prolonged time sequences. Past methods try to reduce sequence lengths and use the Transformer to capture global information. However, due to the quadratic time complexity of the attention module, memory usage and inference time still increase significantly with longer segments. To tackle this, we introduce Focused Linear Attention and build FLASepformer with linear complexity for efficient speech separation. Inspired by SepReformer and TF-Locoformer, we have two variants: FLA-SepReformer and FLA-TFLocoformer. We also add a new Gated module to improve performance further. Experimental results on various datasets show that FLASepformer matches state-of-the-art performance with less memory consumption and faster inference. FLA-SepReformer-T/B/L increases speed by 2.29x, 1.91x, and 1.49x, with 15.8%, 20.9%, and 31.9% GPU memory usage, proving our model's effectiveness.