Soft Disentanglement in Frequency Bands for Neural Audio Codecs

📅 2025-10-04

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

Neural audio codecs often rely on data- or task-specific priors to disentangle frequency-band features, resulting in poor interpretability and limited generalizability. Method: We propose a generic soft disentanglement representation learning framework. It first applies spectral decomposition to project time-domain audio into orthogonal frequency-band subspaces; then employs a multi-branch encoder to model each band independently, jointly optimized via reconstruction and perceptual losses. Crucially, the framework imposes no assumptions about task structure or data distribution, enabling task-agnostic, soft intra-band semantic disentanglement. Contribution/Results: Experiments demonstrate significant improvements over state-of-the-art codecs in objective audio quality metrics (e.g., PESQ, STOI) and perceptual fidelity. Moreover, the learned representations exhibit strong generalization to downstream tasks—such as audio inpainting—and provide interpretable, structured frequency-band semantics without architectural or prior constraints.

Technology Category

Application Category

📝 Abstract

In neural-based audio feature extraction, ensuring that representations capture disentangled information is crucial for model interpretability. However, existing disentanglement methods often rely on assumptions that are highly dependent on data characteristics or specific tasks. In this work, we introduce a generalizable approach for learning disentangled features within a neural architecture. Our method applies spectral decomposition to time-domain signals, followed by a multi-branch audio codec that operates on the decomposed components. Empirical evaluations demonstrate that our approach achieves better reconstruction and perceptual performance compared to a state-of-the-art baseline while also offering potential advantages for inpainting tasks.

Problem

Research questions and friction points this paper is trying to address.

Achieving disentangled representations in neural audio codecs

Improving model interpretability through frequency band decomposition

Enhancing reconstruction and perceptual quality for audio tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Spectral decomposition of time-domain signals

Multi-branch audio codec on decomposed components

Soft frequency band disentanglement for neural codecs

🔎 Similar Papers

Learning Source Disentanglement in Neural Audio Codec