Latent Fourier Transform

📅 2026-04-20

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

Existing generative music models lack intuitive and controllable editing capabilities at the structural level across multiple temporal scales. This work proposes a novel approach that integrates a diffusion autoencoder with latent-space Fourier transforms, introducing frequency-domain control into the latent representation for the first time. By applying spectral masks, the method disentangles and manipulates musical features according to their temporal scales along a continuous frequency axis—functioning analogously to an audio equalizer but operating on structural attributes rather than raw waveforms. It enables example-based generation of variants and blends that preserve characteristics within specified frequency bands. Experiments reveal that distinct musical attributes exhibit localized distributions in the latent spectrogram, and the proposed framework significantly improves both conditional fidelity and generation quality while supporting interpretable, interactive spectral editing and fusion.

Technology Category

Application Category

📝 Abstract

We introduce the Latent Fourier Transform (LatentFT), a framework that provides novel frequency-domain controls for generative music models. LatentFT combines a diffusion autoencoder with a latent-space Fourier transform to separate musical patterns by timescale. By masking latents in the frequency domain during training, our method yields representations that can be manipulated coherently at inference. This allows us to generate musical variations and blends from reference examples while preserving characteristics at desired timescales, which are specified as frequencies in the latent space. LatentFT parallels the role of the equalizer in music production: while traditional equalizers operates on audible frequencies to shape timbre, LatentFT operates on latent-space frequencies to shape musical structure. Experiments and listening tests show that LatentFT improves condition adherence and quality compared to baselines. We also present a technique for hearing frequencies in the latent space in isolation, and show different musical attributes reside in different regions of the latent spectrum. Our results show how frequency-domain control in latent space provides an intuitive, continuous frequency axis for conditioning and blending, advancing us toward more interpretable and interactive generative music models.

Problem

Research questions and friction points this paper is trying to address.

generative music models

frequency-domain control

timescale manipulation

latent space

musical structure

Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent Fourier Transform

frequency-domain control

diffusion autoencoder