Diffusion Adaptive Text Embedding for Text-to-Image Diffusion Models

๐Ÿ“… 2025-10-27
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Current text-to-image diffusion models rely on static text embeddings derived from a frozen pre-trained text encoder, which remain invariant across all denoising steps. This static conditioning prevents the model from adapting textual guidance to evolving intermediate image states, thereby limiting alignment accuracy in cross-concept generation and text-guided editing. To address this, we propose Diffusion Adaptive Text Embedding (DATE), the first method enabling dynamic, training-free text embedding refinement: at each diffusion timestep, DATE adaptively optimizes the text embedding conditioned on the current noisy image state. Leveraging a differentiable optimization framework, DATE preserves the original modelโ€™s generative capability while significantly improving textโ€“image consistency in multi-concept synthesis and editing tasks. Experiments demonstrate consistent improvements over fixed-embedding baselines across multiple benchmarks. The implementation is publicly available.

Technology Category

Application Category

๐Ÿ“ Abstract
Text-to-image diffusion models rely on text embeddings from a pre-trained text encoder, but these embeddings remain fixed across all diffusion timesteps, limiting their adaptability to the generative process. We propose Diffusion Adaptive Text Embedding (DATE), which dynamically updates text embeddings at each diffusion timestep based on intermediate perturbed data. We formulate an optimization problem and derive an update rule that refines the text embeddings at each sampling step to improve alignment and preference between the mean predicted image and the text. This allows DATE to dynamically adapts the text conditions to the reverse-diffused images throughout diffusion sampling without requiring additional model training. Through theoretical analysis and empirical results, we show that DATE maintains the generative capability of the model while providing superior text-image alignment over fixed text embeddings across various tasks, including multi-concept generation and text-guided image editing. Our code is available at https://github.com/aailab-kaist/DATE.
Problem

Research questions and friction points this paper is trying to address.

Dynamic text embedding updates during diffusion process
Improving text-image alignment without retraining models
Adapting text conditions to reverse-diffused images
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamically updates text embeddings per diffusion timestep
Refines embeddings via optimization using intermediate perturbed data
Adapts text conditions without requiring additional model training
๐Ÿ”Ž Similar Papers
No similar papers found.