PixelFlow: Pixel-Space Generative Models with Flow

📅 2025-04-10

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

To address the reliance on pretrained VAEs and the computational overhead and architectural complexity introduced by latent-space mapping in image generation, this paper proposes PixelFlow—the first end-to-end trainable, purely pixel-space flow-based generative model. PixelFlow eliminates VAE encoders and decoders entirely, avoiding any latent-space projection, and instead introduces a learnable pixel-wise normalizing flow architecture. It employs an efficient cascade flow design to enable high-resolution modeling (256×256). On class-conditional ImageNet generation, PixelFlow achieves an FID of 1.98, substantially outperforming prior pixel-space methods. Moreover, in text-to-image synthesis, it demonstrates superior detail fidelity, enhanced semantic controllability, and improved artistic expressiveness.

Technology Category

Application Category

📝 Abstract

We present PixelFlow, a family of image generation models that operate directly in the raw pixel space, in contrast to the predominant latent-space models. This approach simplifies the image generation process by eliminating the need for a pre-trained Variational Autoencoder (VAE) and enabling the whole model end-to-end trainable. Through efficient cascade flow modeling, PixelFlow achieves affordable computation cost in pixel space. It achieves an FID of 1.98 on 256$ imes$256 ImageNet class-conditional image generation benchmark. The qualitative text-to-image results demonstrate that PixelFlow excels in image quality, artistry, and semantic control. We hope this new paradigm will inspire and open up new opportunities for next-generation visual generation models. Code and models are available at https://github.com/ShoufaChen/PixelFlow.

Problem

Research questions and friction points this paper is trying to address.

Generates images directly in pixel space

Eliminates need for pre-trained VAE

Achieves efficient pixel-space computation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Direct pixel-space image generation models

End-to-end trainable without pre-trained VAE

Efficient cascade flow modeling technique

🔎 Similar Papers

No similar papers found.

TikTok

San Jose, California

PhD - Effiziente Neuronale Repräsentation von Datensätzen

Bosch Group

Renningen, BW, DE

AI Research Scientist, Computer Vision - Facebook Video Intelligence