🤖 AI Summary
Current text-to-image generation methods suffer from low identity fidelity, poor cross-identity generalization, and reliance on multiple subject-specific training samples—particularly in multi-identity personalization and fine-grained facial editing. To address these limitations, we propose a zero-shot multi-identity image personalization framework. First, we design a Semantic Activation Attention (SAA) mechanism that enables dynamic multi-identity injection without requiring multiple ID-specific training samples. Second, we introduce an Identity-Motion Reconstructor (IMR), which disentangles and recombines identity and facial motion features via contrastive learning. Third, we adopt a two-stage, fine-tuning-free paradigm incorporating query-level activation gating, and release VariFace-10k—a large-scale dataset comprising 10,000 identities × 35 poses. Extensive experiments demonstrate state-of-the-art performance across identity fidelity, facial editability, and multi-identity generalization.
📝 Abstract
Recent advancements in text-to-image generation have spurred interest in personalized human image generation, which aims to create novel images featuring specific human identities as reference images indicate. Although existing methods achieve high-fidelity identity preservation, they often struggle with limited multi-ID usability and inadequate facial editability. We present DynamicID, a tuning-free framework supported by a dual-stage training paradigm that inherently facilitates both single-ID and multi-ID personalized generation with high fidelity and flexible facial editability. Our key innovations include: 1) Semantic-Activated Attention (SAA), which employs query-level activation gating to minimize disruption to the original model when injecting ID features and achieve multi-ID personalization without requiring multi-ID samples during training. 2) Identity-Motion Reconfigurator (IMR), which leverages contrastive learning to effectively disentangle and re-entangle facial motion and identity features, thereby enabling flexible facial editing. Additionally, we have developed a curated VariFace-10k facial dataset, comprising 10k unique individuals, each represented by 35 distinct facial images. Experimental results demonstrate that DynamicID outperforms state-of-the-art methods in identity fidelity, facial editability, and multi-ID personalization capability.