PixelFlow: Pixel-Space Generative Models with Flow

๐Ÿ“… 2025-04-10
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the reliance on pretrained VAEs and the computational overhead and architectural complexity introduced by latent-space mapping in image generation, this paper proposes PixelFlowโ€”the first end-to-end trainable, purely pixel-space flow-based generative model. PixelFlow eliminates VAE encoders and decoders entirely, avoiding any latent-space projection, and instead introduces a learnable pixel-wise normalizing flow architecture. It employs an efficient cascade flow design to enable high-resolution modeling (256ร—256). On class-conditional ImageNet generation, PixelFlow achieves an FID of 1.98, substantially outperforming prior pixel-space methods. Moreover, in text-to-image synthesis, it demonstrates superior detail fidelity, enhanced semantic controllability, and improved artistic expressiveness.

Technology Category

Application Category

๐Ÿ“ Abstract
We present PixelFlow, a family of image generation models that operate directly in the raw pixel space, in contrast to the predominant latent-space models. This approach simplifies the image generation process by eliminating the need for a pre-trained Variational Autoencoder (VAE) and enabling the whole model end-to-end trainable. Through efficient cascade flow modeling, PixelFlow achieves affordable computation cost in pixel space. It achieves an FID of 1.98 on 256$ imes$256 ImageNet class-conditional image generation benchmark. The qualitative text-to-image results demonstrate that PixelFlow excels in image quality, artistry, and semantic control. We hope this new paradigm will inspire and open up new opportunities for next-generation visual generation models. Code and models are available at https://github.com/ShoufaChen/PixelFlow.
Problem

Research questions and friction points this paper is trying to address.

Generates images directly in pixel space
Eliminates need for pre-trained VAE
Achieves efficient pixel-space computation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Direct pixel-space image generation models
End-to-end trainable without pre-trained VAE
Efficient cascade flow modeling technique
๐Ÿ”Ž Similar Papers
No similar papers found.