Towards Controllable Image Generation through Representation-Conditioned Diffusion Models

📅 2026-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of achieving precise and controllable image generation with diffusion models in the absence of large-scale annotated data. The authors propose a self-conditioning mechanism leveraging pretrained self-supervised representations, which identifies semantic directions in the representation space to guide the diffusion process without requiring labeled conditions. This approach not only enhances unconditional generation quality but also constructs a smooth and disentangled controllable generation space. Experimental results demonstrate that the proposed method achieves superior performance in image generation and editing tasks, excelling in controllability, smoothness, and disentanglement compared to existing alternatives.
📝 Abstract
Diffusion models have emerged as powerful tools for high-quality image generation and editing, but guiding these models to produce specific outputs remains a challenge. Conventional approaches rely on conditioning mechanisms, such as text prompts or semantic maps, which require extensively annotated datasets. In this preliminary work, we explore diffusion models conditioned on representations from a pre-trained self-supervised model. The self-conditioning mechanism not only improves the quality of unconditional image generation, but also provides a representation space that can be used to control the generation. We explore this conditioning space by identifying directions of variations, and demonstrate promising properties in terms of smoothness and disentanglement.
Problem

Research questions and friction points this paper is trying to address.

controllable image generation
diffusion models
representation conditioning
self-supervised learning
generation control
Innovation

Methods, ideas, or system contributions that make the work stand out.

representation-conditioned diffusion
self-supervised learning
controllable generation
disentangled representation
diffusion models
🔎 Similar Papers