SPDiffusion: Semantic Protection Diffusion Models for Multi-concept Text-to-image Generation

📅 2024-09-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current text-to-image diffusion models suffer from semantic entanglement in multi-concept generation (e.g., multiple characters/objects), manifesting as concept confusion and attribute misbinding, thereby degrading text–image alignment. To address this, we propose the Semantic Protection (SP) framework—a label-free approach introducing two novel components: SP-Extraction for unsupervised concept region localization and SP-Attention, a protection-aware cross-attention mechanism. Integrated into the UNet backbone, SP-Attention enables region-aware token masking and feature isolation, effectively disentangling concepts from their attributes. Our method preserves generation fidelity while significantly improving concept localization accuracy and attribute binding correctness. On established multi-concept generation benchmarks, SP achieves state-of-the-art performance, demonstrating robust semantic decoupling without architectural or training overhead.

Technology Category

Application Category

📝 Abstract
Recent text-to-image models have achieved impressive results in generating high-quality images. However, when tasked with multi-concept generation creating images that contain multiple characters or objects, existing methods often suffer from semantic entanglement, including concept entanglement and improper attribute binding, leading to significant text-image inconsistency. We identify that semantic entanglement arises when certain regions of the latent features attend to incorrect concept and attribute tokens. In this work, we propose the Semantic Protection Diffusion Model (SPDiffusion) to address both concept entanglement and improper attribute binding using only a text prompt as input. The SPDiffusion framework introduces a novel concept region extraction method SP-Extraction to resolve region entanglement in cross-attention, along with SP-Attn, which protects concept regions from the influence of irrelevant attributes and concepts. To evaluate our method, we test it on existing benchmarks, where SPDiffusion achieves state-of-the-art results, demonstrating its effectiveness.
Problem

Research questions and friction points this paper is trying to address.

Addresses semantic entanglement in multi-concept text-to-image generation.
Resolves concept entanglement and improper attribute binding issues.
Improves text-image consistency using novel concept region extraction.
Innovation

Methods, ideas, or system contributions that make the work stand out.

SPDiffusion resolves semantic entanglement issues
Introduces SP-Extraction for concept region extraction
SP-Attn protects concept regions from irrelevant attributes
🔎 Similar Papers
No similar papers found.
Y
Yang Zhang
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences
R
Rui Zhang
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences
X
Xuecheng Nie
MT Lab, Meitu Inc.
Haochen Li
Haochen Li
Tsinghua university
cell-cell communicationsingle-cell genomicsspatial transcriptomics
J
Jikun Chen
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences
Y
Yifan Hao
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences
X
Xin Zhang
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences
Luoqi Liu
Luoqi Liu
Director of MT Lab; Meitu
Computer Vision
L
Ling-ling Li
Institute of Software, Chinese Academy of Sciences; University of Chinese Academy of Sciences