Désentrelacement Fréquentiel Doux pour les Codecs Audio Neuronaux

📅 2025-10-04

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

Weak interpretability of representations and dataset- and task-specific disentanglement remain critical bottlenecks in neural audio codecs. To address this, we propose a time-domain neural codec based on spectral decomposition of the input signal, incorporating a soft frequency-band decoupling mechanism that explicitly models semantic independence across frequency subbands during encoding. Our approach further integrates spectral prior structures to guide representation learning. A differentiable separation loss enables joint time–frequency modeling, substantially improving semantic disentanglement and cross-task generalization. Experiments demonstrate that our model surpasses state-of-the-art baselines in reconstruction fidelity (STOI, PESQ) and perceptual quality (MOS). It exhibits strong robustness and versatility across diverse downstream tasks—including speech enhancement, source separation, and audio synthesis—without task-specific architectural modifications.

Technology Category

Application Category

📝 Abstract

While neural-based models have led to significant advancements in audio feature extraction, the interpretability of the learned representations remains a critical challenge. To address this, disentanglement techniques have been integrated into discrete neural audio codecs to impose structure on the extracted tokens. However, these approaches often exhibit strong dependencies on specific datasets or task formulations. In this work, we propose a disentangled neural audio codec that leverages spectral decomposition of time-domain signals to enhance representation interpretability. Experimental evaluations demonstrate that our method surpasses a state-of-the-art baseline in both reconstruction fidelity and perceptual quality.

Problem

Research questions and friction points this paper is trying to address.

Enhancing interpretability of neural audio codec representations

Addressing dataset dependency in disentanglement techniques

Improving reconstruction fidelity and perceptual quality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Spectral decomposition enhances interpretability of audio codecs

Disentangled neural codec improves reconstruction fidelity

Method surpasses baseline in perceptual quality

🔎 Similar Papers

Learning Source Disentanglement in Neural Audio Codec