Latent Forcing: Reordering the Diffusion Trajectory for Pixel-Space Image Generation

📅 2026-02-11

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

Existing latent diffusion models struggle to generate raw pixels efficiently in an end-to-end manner due to information loss during encoding, reliance on separately trained decoders, and auxiliary distribution modeling. This work proposes Latent Forcing, a novel mechanism that jointly models latent representations and pixel space through a dual-path noise schedule and reordered denoising trajectory, enabling latent variables to serve as an intermediate cache prior to high-frequency detail synthesis. For the first time, this approach directly operates on raw images while preserving the computational efficiency of latent diffusion. The method reveals the critical role of conditional signal timing in generation quality and provides a unified perspective linking tokenizer distillation, conditional generation, and diffusibility. Evaluated on ImageNet under comparable computational budgets, it achieves state-of-the-art performance in pixel-level image generation using diffusion-based Transformers.

Technology Category

Application Category

📝 Abstract

Latent diffusion models excel at generating high-quality images but lose the benefits of end-to-end modeling. They discard information during image encoding, require a separately trained decoder, and model an auxiliary distribution to the raw data. In this paper, we propose Latent Forcing, a simple modification to existing architectures that achieves the efficiency of latent diffusion while operating on raw natural images. Our approach orders the denoising trajectory by jointly processing latents and pixels with separately tuned noise schedules. This allows the latents to act as a scratchpad for intermediate computation before high-frequency pixel features are generated. We find that the order of conditioning signals is critical, and we analyze this to explain differences between REPA distillation in the tokenizer and the diffusion model, conditional versus unconditional generation, and how tokenizer reconstruction quality relates to diffusability. Applied to ImageNet, Latent Forcing achieves a new state-of-the-art for diffusion transformer-based pixel generation at our compute scale.

Problem

Research questions and friction points this paper is trying to address.

latent diffusion

pixel-space generation

image encoding

end-to-end modeling

diffusion trajectory

Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent Forcing

diffusion trajectory reordering

pixel-space generation