SIGMA-GEN: Structure and Identity Guided Multi-subject Assembly for Image Generation

📅 2025-10-07

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

This work addresses the challenge of jointly preserving and finely controlling identity features in multi-subject image generation. We propose the first single-forward multi-subject generation framework supporting both structural and spatial constraints. Methodologically, we introduce the first approach to integrate multi-level user guidance—ranging from coarse-grained cues (e.g., 2D/3D bounding boxes, semantic layouts) to pixel-level signals (e.g., segmentation masks, depth maps)—within a single inference pass. Our framework jointly models identity embeddings, structural priors, and spatial layout representations. Trained on our synthetically constructed dataset SIGMA-SET27K, the model achieves state-of-the-art performance in identity fidelity, image quality, and generation efficiency. Quantitative and qualitative evaluations demonstrate significant improvements in realism, controllability, and practical applicability for multi-subject synthesis.

Technology Category

Application Category

📝 Abstract

We present SIGMA-GEN, a unified framework for multi-identity preserving image generation. Unlike prior approaches, SIGMA-GEN is the first to enable single-pass multi-subject identity-preserved generation guided by both structural and spatial constraints. A key strength of our method is its ability to support user guidance at various levels of precision -- from coarse 2D or 3D boxes to pixel-level segmentations and depth -- with a single model. To enable this, we introduce SIGMA-SET27K, a novel synthetic dataset that provides identity, structure, and spatial information for over 100k unique subjects across 27k images. Through extensive evaluation we demonstrate that SIGMA-GEN achieves state-of-the-art performance in identity preservation, image generation quality, and speed. Code and visualizations at https://oindrilasaha.github.io/SIGMA-Gen/

Problem

Research questions and friction points this paper is trying to address.

Enables single-pass multi-subject identity-preserved image generation

Supports user guidance from coarse boxes to pixel-level segmentation

Generates images with structural and spatial constraints using unified framework

Innovation

Methods, ideas, or system contributions that make the work stand out.

Single-pass multi-subject identity-preserved generation

Supports user guidance at various precision levels

Uses synthetic dataset with identity and structure information

🔎 Similar Papers

MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance