Griffin: Generative Reference and Layout Guided Image Composition

📅 2025-09-28

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Text-to-image models generate high-fidelity images but lack fine-grained spatial controllability under pure textual supervision. To address this, we propose a training-free image synthesis framework that enables object- and part-level joint content-position control using only a single reference image and a spatial layout map (e.g., bounding boxes or masks). Our method leverages pre-trained diffusion models, where the reference image provides appearance priors, and a differentiable spatial guidance mechanism enforces precise geometric constraints. We validate our approach on multi-image compositing, part replacement, and complex scene synthesis tasks. Results demonstrate that generated images achieve both photorealism and pixel-accurate layout fidelity—significantly outperforming text-guided and existing image-conditioned methods. Crucially, our framework overcomes the granularity limitation inherent in conventional text-driven paradigms, enabling unprecedented control at the object and component level without architectural modification or fine-tuning.

Technology Category

Application Category

📝 Abstract

Text-to-image models have achieved a level of realism that enables the generation of highly convincing images. However, text-based control can be a limiting factor when more explicit guidance is needed. Defining both the content and its precise placement within an image is crucial for achieving finer control. In this work, we address the challenge of multi-image layout control, where the desired content is specified through images rather than text, and the model is guided on where to place each element. Our approach is training-free, requires a single image per reference, and provides explicit and simple control for object and part-level composition. We demonstrate its effectiveness across various image composition tasks.

Problem

Research questions and friction points this paper is trying to address.

Achieving precise multi-image layout control in generation

Providing explicit guidance for object and part-level composition

Enabling training-free image composition with single reference images

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free generative reference and layout guidance

Single image per reference for object composition

Explicit control for multi-image layout placement

🔎 Similar Papers

A Survey on Future Frame Synthesis: Bridging Deterministic and Generative Approaches