Reusing Computation in Text-to-Image Diffusion for Efficient Generation of Image Sets

📅 2025-08-28

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

Text-to-image diffusion models suffer from significant cross-prompt redundant computation when generating image sets. To address this, we propose the first training-free hierarchical computational reuse framework: (1) prompts are clustered by semantic similarity, enabling early denoising steps to be shared across clusters at a coarse-grained level; (2) an UnCLIP prior guides dynamic step allocation, adaptively determining the optimal number of diffusion steps per prompt; and (3) the method integrates seamlessly with mainstream diffusion pipelines. Our approach substantially reduces computational overhead—particularly for large-scale image generation—while preserving output fidelity, thereby enhancing both environmental sustainability and deployment efficiency. The core contribution lies in establishing a novel cross-prompt computational reuse paradigm, realized through an efficient, general-purpose, and training-free implementation framework.

Technology Category

Application Category

📝 Abstract

Text-to-image diffusion models enable high-quality image generation but are computationally expensive. While prior work optimizes per-inference efficiency, we explore an orthogonal approach: reducing redundancy across correlated prompts. Our method leverages the coarse-to-fine nature of diffusion models, where early denoising steps capture shared structures among similar prompts. We propose a training-free approach that clusters prompts based on semantic similarity and shares computation in early diffusion steps. Experiments show that for models trained conditioned on image embeddings, our approach significantly reduces compute cost while improving image quality. By leveraging UnClip's text-to-image prior, we enhance diffusion step allocation for greater efficiency. Our method seamlessly integrates with existing pipelines, scales with prompt sets, and reduces the environmental and financial burden of large-scale text-to-image generation. Project page: https://ddecatur.github.io/hierarchical-diffusion/

Problem

Research questions and friction points this paper is trying to address.

Reducing computational redundancy across correlated text prompts

Leveraging shared structures in early diffusion steps for efficiency

Enabling scalable and eco-friendly large-scale image generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Clustering prompts by semantic similarity

Sharing computation in early diffusion steps

Leveraging UnClip's text-to-image prior

🔎 Similar Papers

Fast Sampling Through The Reuse Of Attention Maps In Diffusion Models