Global Rotation Equivariant Phase Modeling for Speech Enhancement with Deep Magnitude-Phase Interaction

๐Ÿ“… 2026-02-09
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the limitation of conventional deep learning approaches that model speech phase in Euclidean space, failing to capture its intrinsic circular topology and thereby constraining phase reconstruction performance. To overcome this, the authors propose a manifold-aware magnitude-phase dual-stream framework, which introduces global rotation equivariance (GRE) into phase modeling for speech enhancementโ€”a first in the field. They design two novel components: a Magnitude-Phase Interactive Convolutional Module (MPICM) and a Hybrid-Attention Dual-FFN (HADF) module, both preserving GRE properties to enable deep fusion between magnitude and phase representations. Experimental results demonstrate that the proposed method reduces phase distance by over 20% in phase reconstruction tasks, improves zero-shot cross-corpus denoising PESQ scores by more than 0.1, and achieves superior performance in general-purpose speech enhancement under mixed distortions.

Technology Category

Application Category

๐Ÿ“ Abstract
While deep learning has advanced speech enhancement (SE), effective phase modeling remains challenging, as conventional networks typically operate within a flat Euclidean feature space, which is not easy to model the underlying circular topology of the phase. To address this, we propose a manifold-aware magnitude-phase dual-stream framework that aligns the phase stream with its intrinsic circular geometry by enforcing Global Rotation Equivariance (GRE) characteristic. Specifically, we introduce a Magnitude-Phase Interactive Convolutional Module (MPICM) for modulus-based information exchange and a Hybrid-Attention Dual-FFN (HADF) bottleneck for unified feature fusion, both of which are designed to preserve GRE in the phase stream. Comprehensive evaluations are conducted across phase retrieval, denoising, dereverberation, and bandwidth extension tasks to validate the superiority of the proposed method over multiple advanced baselines. Notably, the proposed architecture reduces Phase Distance by over 20\% in the phase retrieval task and improves PESQ by more than 0.1 in zero-shot cross-corpus denoising evaluations. The overall superiority is also established in universal SE tasks involving mixed distortions. Qualitative analysis further reveals that the learned phase features exhibit distinct periodic patterns, which are consistent with the intrinsic circular nature of the phase. The source code is available at https://github.com/wangchengzhong/RENet.
Problem

Research questions and friction points this paper is trying to address.

phase modeling
speech enhancement
circular topology
rotation equivariance
magnitude-phase interaction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Global Rotation Equivariance
Phase Modeling
Manifold-aware Framework
Magnitude-Phase Interaction
Circular Topology
๐Ÿ”Ž Similar Papers
No similar papers found.
C
Chengzhong Wang
Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China, and also with University of Chinese Academy of Sciences, Beijing 100049, China
A
Andong Li
Key Laboratory of Noise and Vibration Research, Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China, and also with University of Chinese Academy of Sciences, Beijing 100049, China
Dingding Yao
Dingding Yao
Institute of Acoustics, Chinese Academy of Sciences
Spatial HearingBinaural TechnologyAuditory ProcessingHRTF
J
Junfeng Li
Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China, and also with University of Chinese Academy of Sciences, Beijing 100049, China