🤖 AI Summary
This work addresses the challenge of jointly preserving and finely controlling identity features in multi-subject image generation. We propose the first single-forward multi-subject generation framework supporting both structural and spatial constraints. Methodologically, we introduce the first approach to integrate multi-level user guidance—ranging from coarse-grained cues (e.g., 2D/3D bounding boxes, semantic layouts) to pixel-level signals (e.g., segmentation masks, depth maps)—within a single inference pass. Our framework jointly models identity embeddings, structural priors, and spatial layout representations. Trained on our synthetically constructed dataset SIGMA-SET27K, the model achieves state-of-the-art performance in identity fidelity, image quality, and generation efficiency. Quantitative and qualitative evaluations demonstrate significant improvements in realism, controllability, and practical applicability for multi-subject synthesis.
📝 Abstract
We present SIGMA-GEN, a unified framework for multi-identity preserving image generation. Unlike prior approaches, SIGMA-GEN is the first to enable single-pass multi-subject identity-preserved generation guided by both structural and spatial constraints. A key strength of our method is its ability to support user guidance at various levels of precision -- from coarse 2D or 3D boxes to pixel-level segmentations and depth -- with a single model. To enable this, we introduce SIGMA-SET27K, a novel synthetic dataset that provides identity, structure, and spatial information for over 100k unique subjects across 27k images. Through extensive evaluation we demonstrate that SIGMA-GEN achieves state-of-the-art performance in identity preservation, image generation quality, and speed. Code and visualizations at https://oindrilasaha.github.io/SIGMA-Gen/