IdGlow: Dynamic Identity Modulation for Multi-Subject Generation

πŸ“… 2026-02-28
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the stability-plasticity dilemma in multi-subject image generation, where preserving identity fidelity often conflicts with modeling complex structural deformations such as age transformation. The authors propose IdGlow, a two-stage progressive flow-matching diffusion framework that operates without spatial masks and introduces a novel dynamic identity modulation mechanism. This mechanism adaptively injects identity information through a task-aware, timestep-dependent linear decay schedule and semantic key-window gating. Furthermore, context-aware prompts are generated via a failure-case-driven vision-language model, and fine-grained group-level weighted margin DPO optimization is employed to achieve a Pareto-optimal balance between identity preservation and textural harmony. Experiments demonstrate that IdGlow significantly outperforms existing methods on multi-subject composition and age transformation benchmarks, achieving state-of-the-art trade-offs between facial fidelity and commercial-grade aesthetic quality.

Technology Category

Application Category

πŸ“ Abstract
Multi-subject image generation requires seamlessly harmonizing multiple reference identities within a coherent scene. However, existing methods relying on rigid spatial masks or localized attention often struggle with the "stability-plasticity dilemma," particularly failing in tasks that require complex structural deformations, such as identity-preserving age transformation. To address this, we present IdGlow, a mask-free, progressive two-stage framework built upon Flow Matching diffusion models. In the supervised fine-tuning (SFT) stage, we introduce task-adaptive timestep scheduling aligned with diffusion generative dynamics: a linear decay schedule that progressively relaxes constraints for natural group composition, and a temporal gating mechanism that concentrates identity injection within a critical semantic window, successfully preserving adult facial semantics without overriding child-like anatomical structures. To resolve attribute leakage and semantic ambiguity without explicit layout inputs, we further integrate a badcase-driven Vision-Language Model (VLM) for precise, context-aware prompt synthesis. In the second stage, we design a Fine-Grained Group-Level Direct Preference Optimization (DPO) with a weighted margin formulation to simultaneously eliminate multi-subject artifacts, elevate texture harmony, and recalibrate identity fidelity towards real-world distributions. Extensive experiments on two challenging benchmarks -- direct multi-person fusion and age-transformed group generation -- demonstrate that IdGlow fundamentally mitigates the stability-plasticity conflict, achieving a superior Pareto balance between state-of-the-art facial fidelity and commercial-grade aesthetic quality.
Problem

Research questions and friction points this paper is trying to address.

multi-subject generation
identity preservation
stability-plasticity dilemma
age transformation
attribute leakage
Innovation

Methods, ideas, or system contributions that make the work stand out.

Flow Matching
Identity Modulation
Vision-Language Model
Direct Preference Optimization
Stability-Plasticity Dilemma
πŸ”Ž Similar Papers
No similar papers found.
H
Honghao Cai
Xiaohongshu Inc.
Xiangyuan Wang
Xiangyuan Wang
Wuhan University
Neuromorphic VisionImage ProcessingPattern Recognition
Y
Yunhao Bai
Xiaohongshu Inc.
T
Tianze Zhou
Xiaohongshu Inc.
S
Sijie Xu
Xiaohongshu Inc.
Y
Yuyang Hao
The Chinese University of Hong Kong, Shenzhen
Z
Zezhou Cui
The Chinese University of Hong Kong, Shenzhen
Y
Yuyuan Yang
The Chinese University of Hong Kong, Shenzhen
W
Wei Zhu
Xiaohongshu Inc.
Y
Yibo Chen
Xiaohongshu Inc.
Xu Tang
Xu Tang
Xiaohongshu. δΈͺδΊΊδΈ»ι‘΅: https://tangxuvis.github.io/
Face DetectionFace RecognitionGANVideo UnderstandingText Video Retrieval
Yao Hu
Yao Hu
ζ΅™ζ±Ÿε€§ε­¦
Machine Learning
Zhen Li
Zhen Li
Assistant Professor, the Chinese University of Hong Kong, Shenzhen (CUHKSZ)
Deep Learning3D VisionPoint Cloud AnalysisProtein Structure PredictionComputational Biology