UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook

📅 2025-02-27

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

Single-codebook neural audio codecs suffer from limited cross-domain representation capability due to distributional disparities across speech, music, and sound effects. Method: We propose the first unified single-codebook codec for multi-domain audio modeling, integrating region-adaptive codebooks, domain-aware mixture-of-experts (MoE) architecture, self-supervised masked prediction, single-layer residual vector quantization (RVQ), and domain-adaptive vector quantization. Contribution/Results: Our method significantly improves semantic density and waveform reconstruction fidelity. Experiments demonstrate that it outperforms all existing single-codebook unified approaches across all three audio domains in waveform reconstruction quality. Moreover, it matches or exceeds domain-specific state-of-the-art (SOTA) models in both acoustic fidelity and semantic expressiveness—achieving, for the first time, high-quality cross-domain audio generation and understanding within a single-codebook framework.

Technology Category

Application Category

📝 Abstract

The emergence of audio language models is empowered by neural audio codecs, which establish critical mappings between continuous waveforms and discrete tokens compatible with language model paradigms. The evolutionary trends from multi-layer residual vector quantizer to single-layer quantizer are beneficial for language-autoregressive decoding. However, the capability to handle multi-domain audio signals through a single codebook remains constrained by inter-domain distribution discrepancies. In this work, we introduce UniCodec, a unified audio codec with a single codebook to support multi-domain audio data, including speech, music, and sound. To achieve this, we propose a partitioned domain-adaptive codebook method and domain Mixture-of-Experts strategy to capture the distinct characteristics of each audio domain. Furthermore, to enrich the semantic density of the codec without auxiliary modules, we propose a self-supervised mask prediction modeling approach. Comprehensive objective and subjective evaluations demonstrate that UniCodec achieves excellent audio reconstruction performance across the three audio domains, outperforming existing unified neural codecs with a single codebook, and even surpasses state-of-the-art domain-specific codecs on both acoustic and semantic representation capabilities.

Problem

Research questions and friction points this paper is trying to address.

Handles multi-domain audio signals effectively.

Unifies audio codec with single codebook.

Improves acoustic and semantic representation capabilities.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Single domain-adaptive codebook method

Domain Mixture-of-Experts strategy

Self-supervised mask prediction modeling

🔎 Similar Papers

No similar papers found.