ToLo: A Two-Stage, Training-Free Layout-To-Image Generation Framework For High-Overlap Layouts

📅 2025-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing training-free layout-to-image generation methods suffer from entity omission and attribute leakage under highly overlapping layouts. To address this, we propose the first training-free two-stage attention modulation framework: (1) a layout-aware stage that aggregates cross-region attention to enhance spatial understanding, and (2) a concept-disentanglement stage that separates confounded semantic concepts to suppress interference. We introduce the IoU-based Hierarchical Overlap Subset (HRS), the first systematically constructed benchmark for evaluating generation fidelity under high-layout-overlap conditions. Additionally, we design an attention-map-guided loss function and an IoU-driven metric to quantify layout overlap density. Experiments demonstrate that our method significantly outperforms state-of-the-art approaches in high-overlap scenarios, substantially improving entity completeness and mitigating attribute leakage. Code and the HRS benchmark are publicly released.

Technology Category

Application Category

📝 Abstract
Recent training-free layout-to-image diffusion models have demonstrated remarkable performance in generating high-quality images with controllable layouts. These models follow a one-stage framework: Encouraging the model to focus the attention map of each concept on its corresponding region by defining attention map-based losses. However, these models still struggle to accurately follow layouts with significant overlap, often leading to issues like attribute leakage and missing entities. In this paper, we propose ToLo, a two-stage, training-free layout-to-image generation framework for high-overlap layouts. Our framework consists of two stages: the aggregation stage and the separation stage, each with its own loss function based on the attention map. To provide a more effective evaluation, we partition the HRS dataset based on the Intersection over Union (IoU) of the input layouts, creating a new dataset for layout-to-image generation with varying levels of overlap. Through extensive experiments on this dataset, we demonstrate that ToLo significantly enhances the performance of existing methods when dealing with high-overlap layouts. Our code and dataset are available here: https://github.com/misaka12435/ToLo.
Problem

Research questions and friction points this paper is trying to address.

Improves image generation for high-overlap layouts
Addresses attribute leakage and missing entities issues
Introduces a two-stage framework with attention-based losses
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage training-free layout-to-image framework
Attention map-based losses for high-overlap layouts
New dataset with varying overlap levels for evaluation