🤖 AI Summary
DF-Conformer’s FAVOR+ approximate attention mechanism compromises global sequence modeling capability and struggles to balance accuracy with linear computational complexity. To address this, we propose Hydra-Genhancer: a novel generative speech enhancement framework that replaces FAVOR+ with a bidirectional selective structured state space model (Hydra), eliminating approximation errors while preserving O(L) time complexity and significantly improving long-range dependency modeling. Hydra is integrated into the Genhancer architecture to enable efficient, high-fidelity reconstruction over discrete codec token sequences. Experiments demonstrate consistent and substantial improvements over DF-Conformer across objective speech quality metrics—PESQ and STOI—as well as naturalness scores, particularly under low signal-to-noise ratio conditions. Hydra-Genhancer establishes a new paradigm for lightweight, generative speech enhancement by unifying theoretical rigor, computational efficiency, and perceptual fidelity.
📝 Abstract
The Dilated FAVOR Conformer (DF-Conformer) is an efficient variant of the Conformer architecture designed for speech enhancement (SE). It employs fast attention through positive orthogonal random features (FAVOR+) to mitigate the quadratic complexity associated with self-attention, while utilizing dilated convolution to expand the receptive field. This combination results in impressive performance across various SE models. In this paper, we propose replacing FAVOR+ with bidirectional selective structured state-space sequence models to achieve two main objectives:(1) enhancing global sequential modeling by eliminating the approximations inherent in FAVOR+, and (2) maintaining linear complexity relative to the sequence length. Specifically, we utilize Hydra, a bidirectional extension of Mamba, framed within the structured matrix mixer framework. Experiments conducted using a generative SE model on discrete codec tokens, known as Genhancer, demonstrate that the proposed method surpasses the performance of the DF-Conformer.