SIGMA-GEN: Structure and Identity Guided Multi-subject Assembly for Image Generation

📅 2025-10-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of jointly preserving and finely controlling identity features in multi-subject image generation. We propose the first single-forward multi-subject generation framework supporting both structural and spatial constraints. Methodologically, we introduce the first approach to integrate multi-level user guidance—ranging from coarse-grained cues (e.g., 2D/3D bounding boxes, semantic layouts) to pixel-level signals (e.g., segmentation masks, depth maps)—within a single inference pass. Our framework jointly models identity embeddings, structural priors, and spatial layout representations. Trained on our synthetically constructed dataset SIGMA-SET27K, the model achieves state-of-the-art performance in identity fidelity, image quality, and generation efficiency. Quantitative and qualitative evaluations demonstrate significant improvements in realism, controllability, and practical applicability for multi-subject synthesis.

Technology Category

Application Category

📝 Abstract
We present SIGMA-GEN, a unified framework for multi-identity preserving image generation. Unlike prior approaches, SIGMA-GEN is the first to enable single-pass multi-subject identity-preserved generation guided by both structural and spatial constraints. A key strength of our method is its ability to support user guidance at various levels of precision -- from coarse 2D or 3D boxes to pixel-level segmentations and depth -- with a single model. To enable this, we introduce SIGMA-SET27K, a novel synthetic dataset that provides identity, structure, and spatial information for over 100k unique subjects across 27k images. Through extensive evaluation we demonstrate that SIGMA-GEN achieves state-of-the-art performance in identity preservation, image generation quality, and speed. Code and visualizations at https://oindrilasaha.github.io/SIGMA-Gen/
Problem

Research questions and friction points this paper is trying to address.

Enables single-pass multi-subject identity-preserved image generation
Supports user guidance from coarse boxes to pixel-level segmentation
Generates images with structural and spatial constraints using unified framework
Innovation

Methods, ideas, or system contributions that make the work stand out.

Single-pass multi-subject identity-preserved generation
Supports user guidance at various precision levels
Uses synthetic dataset with identity and structure information