🤖 AI Summary
This work identifies and formally names a previously uncharacterized phenomenon in text-to-image generation—Order-to-Space Bias (OTS)—where the sequential order of entities mentioned in a textual prompt inappropriately influences the spatial layout and role assignment in the generated image. To systematically evaluate this bias, we introduce OTS-Bench, the first dedicated benchmark employing paired prompts with controlled variables and a suite of layout-aware evaluation metrics to assess mainstream models. We further propose targeted fine-tuning and early-generation intervention strategies to mitigate OTS. Experimental results demonstrate that our approaches significantly alleviate the bias while preserving high-fidelity image synthesis.
📝 Abstract
We study a systematic bias in modern image generation models: the mention order of entities in text spuriously determines spatial layout and entity--role binding. We term this phenomenon Order-to-Space Bias (OTS) and show that it arises in both text-to-image and image-to-image generation, often overriding grounded cues and causing incorrect layouts or swapped assignments. To quantify OTS, we introduce OTS-Bench, which isolates order effects with paired prompts differing only in entity order and evaluates models along two dimensions: homogenization and correctness. Experiments show that Order-to-Space Bias (OTS) is widespread in modern image generation models, and provide evidence that it is primarily data-driven and manifests during the early stages of layout formation. Motivated by this insight, we show that both targeted fine-tuning and early-stage intervention strategies can substantially reduce OTS, while preserving generation quality.