Latent Fourier Transform

๐Ÿ“… 2026-04-20
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

198K/year
๐Ÿค– AI Summary
Existing generative music models lack intuitive and controllable editing capabilities at the structural level across multiple temporal scales. This work proposes a novel approach that integrates a diffusion autoencoder with latent-space Fourier transforms, introducing frequency-domain control into the latent representation for the first time. By applying spectral masks, the method disentangles and manipulates musical features according to their temporal scales along a continuous frequency axisโ€”functioning analogously to an audio equalizer but operating on structural attributes rather than raw waveforms. It enables example-based generation of variants and blends that preserve characteristics within specified frequency bands. Experiments reveal that distinct musical attributes exhibit localized distributions in the latent spectrogram, and the proposed framework significantly improves both conditional fidelity and generation quality while supporting interpretable, interactive spectral editing and fusion.

Technology Category

Application Category

๐Ÿ“ Abstract
We introduce the Latent Fourier Transform (LatentFT), a framework that provides novel frequency-domain controls for generative music models. LatentFT combines a diffusion autoencoder with a latent-space Fourier transform to separate musical patterns by timescale. By masking latents in the frequency domain during training, our method yields representations that can be manipulated coherently at inference. This allows us to generate musical variations and blends from reference examples while preserving characteristics at desired timescales, which are specified as frequencies in the latent space. LatentFT parallels the role of the equalizer in music production: while traditional equalizers operates on audible frequencies to shape timbre, LatentFT operates on latent-space frequencies to shape musical structure. Experiments and listening tests show that LatentFT improves condition adherence and quality compared to baselines. We also present a technique for hearing frequencies in the latent space in isolation, and show different musical attributes reside in different regions of the latent spectrum. Our results show how frequency-domain control in latent space provides an intuitive, continuous frequency axis for conditioning and blending, advancing us toward more interpretable and interactive generative music models.
Problem

Research questions and friction points this paper is trying to address.

generative music models
frequency-domain control
timescale manipulation
latent space
musical structure
Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent Fourier Transform
frequency-domain control
diffusion autoencoder
generative music modeling
latent-space manipulation