🤖 AI Summary
This work addresses the challenge of achieving high-quality image generation with extremely low inference steps while avoiding the high computational cost of conventional diffusion models. The authors propose a spherical encoder framework that, for the first time, models the image latent space as a uniform hypersphere, enabling efficient single-step generation trained solely with reconstruction loss. In this framework, an encoder maps images onto the spherical latent space, and a decoder generates images from randomly sampled points on the sphere, naturally supporting both conditional generation and iterative refinement. Experiments demonstrate that the method achieves generation quality comparable to state-of-the-art diffusion models across multiple datasets using only 1–5 inference steps, substantially reducing computational overhead.
📝 Abstract
We introduce the Sphere Encoder, an efficient generative framework capable of producing images in a single forward pass and competing with many-step diffusion models using fewer than five steps. Our approach works by learning an encoder that maps natural images uniformly onto a spherical latent space, and a decoder that maps random latent vectors back to the image space. Trained solely through image reconstruction losses, the model generates an image by simply decoding a random point on the sphere. Our architecture naturally supports conditional generation, and looping the encoder/decoder a few times can further enhance image quality. Across several datasets, the sphere encoder approach yields performance competitive with state of the art diffusions, but with a small fraction of the inference cost. Project page is available at https://sphere-encoder.github.io .