InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis

๐Ÿ“… 2025-09-12
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing diffusion models suffer from quadratic computational complexity in generating high-resolution images (e.g., 4K), requiring over 100 seconds per imageโ€”severely hindering deployment across heterogeneous devices for consistent visual experiences. To address this, we propose InfGen: the first resolution-agnostic, arbitrary-scale image generation paradigm. InfGen operates without modifying or retraining the original latent diffusion model (LDM). It takes a fixed-size latent representation as input and replaces the VAE decoder with a lightweight, single-step generator, augmented by a resolution-adaptive decoding mechanism. Crucially, InfGen preserves full compatibility with pretrained LDMs, introduces zero additional parameters, and reduces 4K image generation time to under 10 seconds. This achieves substantial efficiency gains and enhanced deployment flexibility, advancing latent diffusion models toward practical, high-resolution image synthesis.

Technology Category

Application Category

๐Ÿ“ Abstract
Arbitrary resolution image generation provides a consistent visual experience across devices, having extensive applications for producers and consumers. Current diffusion models increase computational demand quadratically with resolution, causing 4K image generation delays over 100 seconds. To solve this, we explore the second generation upon the latent diffusion models, where the fixed latent generated by diffusion models is regarded as the content representation and we propose to decode arbitrary resolution images with a compact generated latent using a one-step generator. Thus, we present the extbf{InfGen}, replacing the VAE decoder with the new generator, for generating images at any resolution from a fixed-size latent without retraining the diffusion models, which simplifies the process, reducing computational complexity and can be applied to any model using the same latent space. Experiments show InfGen is capable of improving many models into the arbitrary high-resolution era while cutting 4K image generation time to under 10 seconds.
Problem

Research questions and friction points this paper is trying to address.

Enabling scalable image synthesis across arbitrary resolutions
Reducing computational complexity in high-resolution image generation
Accelerating 4K image generation while maintaining quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Resolution-agnostic image synthesis paradigm
One-step generator replaces VAE decoder
Fixed-size latent enables arbitrary resolution generation
๐Ÿ”Ž Similar Papers
No similar papers found.