SAME: A Semantically-Aligned Music Autoencoder

πŸ“… 2026-05-18
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

236K/year
πŸ€– AI Summary
This work addresses the challenge of preserving high-fidelity reconstruction and downstream generative performance for music and general audio under extreme temporal compression ratiosβ€”up to 4096Γ—. To this end, the authors propose SAME, a semantic-aligned autoencoder tailored for stereo audio, which uniquely integrates semantic regularization, phase-aware reconstruction loss, and an enhanced discriminator within a Transformer-based architecture to enable efficient representation learning. The study releases two variants: SAME-L, a high-performance model, and SAME-S, a lightweight version deployable on CPU. Both achieve substantial reductions in computational overhead while maintaining excellent audio fidelity and generative capability.
πŸ“ Abstract
Latent representations are at the heart of the majority of modern generative models. In the audio domain they are typically produced by a neural-audio-codec autoencoder. In this work we introduce SAME (Semantically-Aligned Music autoEncoder), an autoencoder for stereo music and general audio that reaches a 4096$\times$ temporal compression ratio while maintaining reconstruction quality and downstream generative performance. We achieve this by combining a tranformer-based backbone with set of semantic regularisation approaches, phase-aware reconstruction losses and improved discriminator designs. The architecture delivers substantial computational cost benefits, through both its high compression ratio and its reliance on well-optimised transformer primitives. Two variants (a large SAME-L and a CPU-deployable SAME-S) are released in open-weights form.
Problem

Research questions and friction points this paper is trying to address.

latent representation
audio compression
music autoencoder
temporal compression ratio
stereo audio
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantically-Aligned
Music Autoencoder
Temporal Compression
Phase-Aware Reconstruction
Transformer-Based Architecture
πŸ”Ž Similar Papers
No similar papers found.