Improving DF-Conformer Using Hydra For High-Fidelity Generative Speech Enhancement on Discrete Codec Token

📅 2025-11-04

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

DF-Conformer’s FAVOR+ approximate attention mechanism compromises global sequence modeling capability and struggles to balance accuracy with linear computational complexity. To address this, we propose Hydra-Genhancer: a novel generative speech enhancement framework that replaces FAVOR+ with a bidirectional selective structured state space model (Hydra), eliminating approximation errors while preserving O(L) time complexity and significantly improving long-range dependency modeling. Hydra is integrated into the Genhancer architecture to enable efficient, high-fidelity reconstruction over discrete codec token sequences. Experiments demonstrate consistent and substantial improvements over DF-Conformer across objective speech quality metrics—PESQ and STOI—as well as naturalness scores, particularly under low signal-to-noise ratio conditions. Hydra-Genhancer establishes a new paradigm for lightweight, generative speech enhancement by unifying theoretical rigor, computational efficiency, and perceptual fidelity.

Technology Category

Application Category

📝 Abstract

The Dilated FAVOR Conformer (DF-Conformer) is an efficient variant of the Conformer architecture designed for speech enhancement (SE). It employs fast attention through positive orthogonal random features (FAVOR+) to mitigate the quadratic complexity associated with self-attention, while utilizing dilated convolution to expand the receptive field. This combination results in impressive performance across various SE models. In this paper, we propose replacing FAVOR+ with bidirectional selective structured state-space sequence models to achieve two main objectives:(1) enhancing global sequential modeling by eliminating the approximations inherent in FAVOR+, and (2) maintaining linear complexity relative to the sequence length. Specifically, we utilize Hydra, a bidirectional extension of Mamba, framed within the structured matrix mixer framework. Experiments conducted using a generative SE model on discrete codec tokens, known as Genhancer, demonstrate that the proposed method surpasses the performance of the DF-Conformer.

Problem

Research questions and friction points this paper is trying to address.

Enhancing global sequential modeling in speech enhancement

Maintaining linear complexity relative to sequence length

Improving generative speech enhancement on discrete codec tokens

Innovation

Methods, ideas, or system contributions that make the work stand out.

Replaced FAVOR+ with bidirectional selective state-space models

Used Hydra extension of Mamba for global sequence modeling

Maintained linear complexity while enhancing speech enhancement performance

🔎 Similar Papers

No similar papers found.