S-PRESSO: Ultra Low Bitrate Sound Effect Compression With Diffusion Autoencoders And Offline Quantization

📅 2026-02-16

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

Existing neural audio compression methods struggle to reconstruct high-sampling-rate audio (e.g., 48 kHz) at extremely low bitrates (<1 kbps), often suffering from severe distortion and audible artifacts. This work proposes S-PRESSO, the first approach to employ a pretrained latent diffusion model as a decoder, combined with offline quantization and a continuous-discrete joint embedding representation. Operating at an ultra-low bitrate of merely 0.096 kbps—corresponding to a 1 Hz frame rate and a 750× compression ratio—S-PRESSO achieves high-fidelity audio reconstruction, effectively overcoming the quality bottleneck inherent to such extreme compression regimes. The method significantly outperforms both continuous and discrete baseline approaches across multiple metrics, including perceptual audio quality, acoustic similarity, and reconstruction fidelity.

Technology Category

Application Category

📝 Abstract

Neural audio compression models have recently achieved extreme compression rates, enabling efficient latent generative modeling. Conversely, latent generative models have been applied to compression, pushing the limits of continuous and discrete approaches. However, existing methods remain constrained to low-resolution audio and degrade substantially at very low bitrates, where audible artifacts are prominent. In this paper, we present S-PRESSO, a 48kHz sound effect compression model that produces both continuous and discrete embeddings at ultra-low bitrates, down to 0.096 kbps, via offline quantization. Our model relies on a pretrained latent diffusion model to decode compressed audio embeddings learned by a latent encoder. Leveraging the generative priors of the diffusion decoder, we achieve extremely low frame rates, down to 1Hz (750x compression rate), producing convincing and realistic reconstructions at the cost of exact fidelity. Despite operating at high compression rates, we demonstrate that S-PRESSO outperforms both continuous and discrete baselines in audio quality, acoustic similarity and reconstruction metrics.

Problem

Research questions and friction points this paper is trying to address.

ultra low bitrate

audio compression

sound effect

audible artifacts

high compression rate

Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion autoencoder

ultra low bitrate compression

offline quantization