Unified Latents (UL): How to train your latents

📅 2026-02-19

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This work addresses the challenge of efficiently learning a unified latent representation that simultaneously achieves high reconstruction quality and low bitrate. The authors propose a novel latent representation learning framework that, for the first time, explicitly aligns the noise level at the encoder output with the minimum noise level of the diffusion prior. This alignment introduces a tight upper bound on bitrate and enables joint optimization of representation fidelity and training efficiency through a streamlined objective. By integrating diffusion-based decoding, noise schedule alignment, and a FLOPs-aware training strategy, the method achieves an FID of 1.4 and high PSNR on ImageNet-512 with lower computational cost than Stable Diffusion–based latent approaches, and sets a new state of the art on Kinetics-600 with an FVD of 1.3.

Technology Category

Application Category

📝 Abstract

We present Unified Latents (UL), a framework for learning latent representations that are jointly regularized by a diffusion prior and decoded by a diffusion model. By linking the encoder's output noise to the prior's minimum noise level, we obtain a simple training objective that provides a tight upper bound on the latent bitrate. On ImageNet-512, our approach achieves competitive FID of 1.4, with high reconstruction quality (PSNR) while requiring fewer training FLOPs than models trained on Stable Diffusion latents. On Kinetics-600, we set a new state-of-the-art FVD of 1.3.

Problem

Research questions and friction points this paper is trying to address.

latent representation

diffusion prior

bitrate

reconstruction quality

computational efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified Latents

diffusion prior

latent representation