Improving Multi-Subject Consistency in Open-Domain Image Generation with Isolation and Reposition Attention

📅 2024-11-28

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

206K/year

🤖 AI Summary

To address degraded identity consistency in open-domain multi-subject image generation—caused by erroneous inter-subject attention attraction and failure of cross-location reference—we propose a training-free attention redesign method for diffusion models. Our approach comprises two core components: (1) isolated attention, which suppresses redundant inter-subject attention via token-level constraints; and (2) relocated attention, which jointly remaps attention across spatial scales and positional coordinates to strengthen subject–location alignment. Together, these mechanisms improve both subject identity consistency and spatial layout plausibility. Evaluated on open-domain multi-subject generation tasks, our method significantly outperforms existing training-free diffusion-based approaches, effectively mitigating subject fusion, misplacement, and identity confusion. It establishes a novel zero-shot paradigm for controllable multi-subject image generation.

Technology Category

Application Category

📝 Abstract

Training-free diffusion models have achieved remarkable progress in generating multi-subject consistent images within open-domain scenarios. The key idea of these methods is to incorporate reference subject information within the attention layer. However, existing methods still obtain suboptimal performance when handling numerous subjects. This paper reveals two primary issues contributing to this deficiency. Firstly, the undesired internal attraction between different subjects within the target image can lead to the convergence of multiple subjects into a single entity. Secondly, tokens tend to reference nearby tokens, which reduces the effectiveness of the attention mechanism when there is a significant positional difference between subjects in reference and target images. To address these issues, we propose a training-free diffusion model with Isolation and Reposition Attention, named IR-Diffusion. Specifically, Isolation Attention ensures that multiple subjects in the target image do not reference each other, effectively eliminating the subject convergence. On the other hand, Reposition Attention involves scaling and repositioning subjects in both reference and target images to the same position within the images. This ensures that subjects in the target image can better reference those in the reference image, thereby maintaining better consistency. Extensive experiments demonstrate that IR-Diffusion significantly enhances multi-subject consistency, outperforming all existing methods in open-domain scenarios.

Problem

Research questions and friction points this paper is trying to address.

Prevent subject convergence in multi-subject image generation

Enhance attention mechanism for positional differences

Improve multi-subject consistency in open-domain scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Isolation Attention prevents subject convergence

Reposition Attention aligns subjects spatially

Training-free diffusion model enhances consistency

🔎 Similar Papers

MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance