Training-Free Object-Background Compositional T2I via Dynamic Spatial Guidance and Multi-Path Pruning

📅 2026-04-10
📈 Citations: 0
Influential: 0
📄 PDF

career value

193K/year
🤖 AI Summary
Current text-to-image diffusion models commonly suffer from foreground bias, resulting in degraded background quality and insufficient scene-wide coherence, which hinders controllable composition of objects and backgrounds. This work proposes a training-free sampling framework that dynamically guides spatial generation through a timestep-dependent soft gating mechanism. By integrating internal attention statistics with external semantic signals, the method performs multi-path latent trajectory pruning to explicitly model foreground-background interactions. For the first time, this approach significantly enhances spatial balance and semantic consistency without any additional training, consistently improving background coherence and object-background alignment across multiple diffusion backbones. A dedicated evaluation benchmark is also introduced to validate the method’s generalization capability.

Technology Category

Application Category

📝 Abstract
Existing text-to-image diffusion models, while excelling at subject synthesis, exhibit a persistent foreground bias that treats the background as a passive and under-optimized byproduct. This imbalance compromises global scene coherence and constrains compositional control. To address the limitation, we propose a training-free framework that restructures diffusion sampling to explicitly account for foreground-background interactions. Our approach consists of two key components. First, Dynamic Spatial Guidance introduces a soft, time step dependent gating mechanism that modulates foreground and background attention during the diffusion process, enabling spatially balanced generation. Second, Multi-Path Pruning performs multi-path latent exploration and dynamically filters candidate trajectories using both internal attention statistics and external semantic alignment signals, retaining trajectories that better satisfy object-background constraints. We further develop a benchmark specifically designed to evaluate object-background compositionality. Extensive evaluations across multiple diffusion backbones demonstrate consistent improvements in background coherence and object-background compositional alignment.
Problem

Research questions and friction points this paper is trying to address.

foreground bias
background coherence
object-background compositionality
text-to-image generation
scene coherence
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-Free
Object-Background Composition
Dynamic Spatial Guidance
Multi-Path Pruning
Diffusion Models
🔎 Similar Papers
No similar papers found.