Unified Latents (UL): How to train your latents

๐Ÿ“… 2026-02-19
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the challenge of efficiently learning a unified latent representation that simultaneously achieves high reconstruction quality and low bitrate. The authors propose a novel latent representation learning framework that, for the first time, explicitly aligns the noise level at the encoder output with the minimum noise level of the diffusion prior. This alignment introduces a tight upper bound on bitrate and enables joint optimization of representation fidelity and training efficiency through a streamlined objective. By integrating diffusion-based decoding, noise schedule alignment, and a FLOPs-aware training strategy, the method achieves an FID of 1.4 and high PSNR on ImageNet-512 with lower computational cost than Stable Diffusionโ€“based latent approaches, and sets a new state of the art on Kinetics-600 with an FVD of 1.3.

Technology Category

Application Category

๐Ÿ“ Abstract
We present Unified Latents (UL), a framework for learning latent representations that are jointly regularized by a diffusion prior and decoded by a diffusion model. By linking the encoder's output noise to the prior's minimum noise level, we obtain a simple training objective that provides a tight upper bound on the latent bitrate. On ImageNet-512, our approach achieves competitive FID of 1.4, with high reconstruction quality (PSNR) while requiring fewer training FLOPs than models trained on Stable Diffusion latents. On Kinetics-600, we set a new state-of-the-art FVD of 1.3.
Problem

Research questions and friction points this paper is trying to address.

latent representation
diffusion prior
bitrate
reconstruction quality
computational efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified Latents
diffusion prior
latent representation
bitrate bound
efficient training