🤖 AI Summary
Diffusion models suffer from low counting accuracy and poor layout controllability when generating complex, high-density scenes with a specified number of objects. To address this, we propose a training-free iterative proxy-guided framework. It first employs a language-driven planning module to generate an initial scene layout; then, discriminative proxies assess both instance count fidelity and spatial plausibility; finally, instance-driven attention masking and compositional generation strategies are leveraged to mitigate occlusion artifacts. The generation process is dynamically refined over multiple rounds of proxy feedback. Our method achieves 98% instance counting accuracy and an FID of 0.97 across multiple benchmarks—substantially outperforming existing layout-constrained and gradient-guided approaches. Notably, it is the first method to achieve unified, zero-shot precise counting and controllable layout synthesis in high-density scenes.
📝 Abstract
Diffusion models have shown remarkable progress in photorealistic image synthesis, yet they remain unreliable for generating scenes with a precise number of object instances, particularly in complex and high-density settings. We present CountLoop, a training-free framework that provides diffusion models with accurate instance control through iterative structured feedback. The approach alternates between image generation and multimodal agent evaluation, where a language-guided planner and critic assess object counts, spatial arrangements, and attribute consistency. This feedback is then used to refine layouts and guide subsequent generations. To further improve separation between objects, especially in occluded scenes, we introduce instance-driven attention masking and compositional generation techniques. Experiments on COCO Count, T2I CompBench, and two new high-instance benchmarks show that CountLoop achieves counting accuracy of up to 98% while maintaining spatial fidelity and visual quality, outperforming layout-based and gradient-guided baselines with a score of 0.97.