Controllable Coupled Image Generation via Diffusion Models

πŸ“… 2025-06-07
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the problem of controllable multi-image co-generation, where the goal is to generate multiple images sharing a consistent background while enabling the central object to vary flexibly according to distinct text prompts. The proposed method is a text-driven diffusion model that explicitly decouples background and foreground representations. Its key contributions are: (1) the first introduction of a time-varying weight decoupling mechanism within the cross-attention layers of diffusion models, enabling explicit separation of background and foreground features; and (2) a multi-objective sampling optimization framework that jointly enhances background coupling, text–image alignment, and visual fidelity. Extensive experiments demonstrate that the method significantly outperforms existing approaches in background consistency, text fidelity, and image quality. By enabling precise, prompt-conditioned foreground manipulation without compromising background coherence, it establishes a novel paradigm for controllable image generation.

Technology Category

Application Category

πŸ“ Abstract
We provide an attention-level control method for the task of coupled image generation, where"coupled"means that multiple simultaneously generated images are expected to have the same or very similar backgrounds. While backgrounds coupled, the centered objects in the generated images are still expected to enjoy the flexibility raised from different text prompts. The proposed method disentangles the background and entity components in the model's cross-attention modules, attached with a sequence of time-varying weight control parameters depending on the time step of sampling. We optimize this sequence of weight control parameters with a combined objective that assesses how coupled the backgrounds are as well as text-to-image alignment and overall visual quality. Empirical results demonstrate that our method outperforms existing approaches across these criteria.
Problem

Research questions and friction points this paper is trying to address.

Control background coupling in multi-image generation
Disentangle background and object via cross-attention
Optimize weight parameters for alignment and quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Attention-level control for coupled image generation
Disentangles background and entity in cross-attention
Time-varying weight control parameters optimization
πŸ”Ž Similar Papers
No similar papers found.