Désentrelacement Fréquentiel Doux pour les Codecs Audio Neuronaux

📅 2025-10-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Weak interpretability of representations and dataset- and task-specific disentanglement remain critical bottlenecks in neural audio codecs. To address this, we propose a time-domain neural codec based on spectral decomposition of the input signal, incorporating a soft frequency-band decoupling mechanism that explicitly models semantic independence across frequency subbands during encoding. Our approach further integrates spectral prior structures to guide representation learning. A differentiable separation loss enables joint time–frequency modeling, substantially improving semantic disentanglement and cross-task generalization. Experiments demonstrate that our model surpasses state-of-the-art baselines in reconstruction fidelity (STOI, PESQ) and perceptual quality (MOS). It exhibits strong robustness and versatility across diverse downstream tasks—including speech enhancement, source separation, and audio synthesis—without task-specific architectural modifications.

Technology Category

Application Category

📝 Abstract
While neural-based models have led to significant advancements in audio feature extraction, the interpretability of the learned representations remains a critical challenge. To address this, disentanglement techniques have been integrated into discrete neural audio codecs to impose structure on the extracted tokens. However, these approaches often exhibit strong dependencies on specific datasets or task formulations. In this work, we propose a disentangled neural audio codec that leverages spectral decomposition of time-domain signals to enhance representation interpretability. Experimental evaluations demonstrate that our method surpasses a state-of-the-art baseline in both reconstruction fidelity and perceptual quality.
Problem

Research questions and friction points this paper is trying to address.

Enhancing interpretability of neural audio codec representations
Addressing dataset dependency in disentanglement techniques
Improving reconstruction fidelity and perceptual quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spectral decomposition enhances interpretability of audio codecs
Disentangled neural codec improves reconstruction fidelity
Method surpasses baseline in perceptual quality
🔎 Similar Papers
No similar papers found.