Diffusion Mental Averages

๐Ÿ“… 2026-03-31
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the challenge that existing diffusion models produce blurry outputs when naively averaging multiple images generated from the same prompt in pixel or latent space, failing to yield a clear conceptual prototype. The authors propose a novel method that aligns denoising trajectories within the semantic space of diffusion models by optimizing multiple noisy latent variables so their denoising paths progressively converge toward a shared coarse-to-fine semantic structure. This approach generates a single, photorealistic, and semantically coherent concept-averaged imageโ€”marking the first realization of semantic-level concept averaging internal to diffusion models. Integrating CLIP-based semantic clustering, Textual Inversion, and LoRA, the method supports multimodal concept clustering, reveals model biases and internal conceptual representations, and produces sharp visual summaries even for abstract concepts.
๐Ÿ“ Abstract
Can a diffusion model produce its own "mental average" of a concept-one that is as sharp and realistic as a typical sample? We introduce Diffusion Mental Averages (DMA), a model-centric answer to this question. While prior methods aim to average image collections, they produce blurry results when applied to diffusion samples from the same prompt. These data-centric techniques operate outside the model, ignoring the generative process. In contrast, DMA averages within the diffusion model's semantic space, as discovered by recent studies. Since this space evolves across timesteps and lacks a direct decoder, we cast averaging as trajectory alignment: optimize multiple noise latents so their denoising trajectories progressively converge toward shared coarse-to-fine semantics, yielding a single sharp prototype. We extend our approach to multimodal concepts (e.g., dogs with many breeds) by clustering samples in semantically-rich spaces such as CLIP and applying Textual Inversion or LoRA to bridge CLIP clusters into diffusion space. This is, to our knowledge, the first approach that delivers consistent, realistic averages, even for abstract concepts, serving as a concrete visual summary and a lens into model biases and concept representation.
Problem

Research questions and friction points this paper is trying to address.

diffusion models
mental average
image generation
concept representation
semantic averaging
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion Mental Averages
trajectory alignment
semantic space
Textual Inversion
LoRA
๐Ÿ”Ž Similar Papers
No similar papers found.
P
Phonphrm Thawatdamrongkit
VISTEC, Thailand
S
Sukit Seripanitkarn
VISTEC, Thailand
Supasorn Suwajanakorn
Supasorn Suwajanakorn
VISTEC
VisionDeep LearningGraphics