🤖 AI Summary
Existing tuning-free face personalization methods struggle to simultaneously achieve high identity fidelity and strong textual controllability. This paper proposes UniID—a unified, fine-tuning-free framework that enables decoupled modeling of identity and text control in diffusion models via collaborative injection of text-embedding mapping and adapters. Its core innovation lies in separating concerns across stages: during training, it focuses exclusively on learning robust identity representations; during inference, it dynamically restores textual controllability through normalized rescaling and cross-attention mechanisms—preserving the pretrained diffusion prior while avoiding entanglement between identity and textual features. Extensive experiments against six state-of-the-art methods demonstrate that UniID achieves superior performance on both key metrics: identity similarity (ID-Sim) and text alignment (CLIP-Score). To our knowledge, UniID is the first tuning-free approach to jointly enhance both metrics, establishing a new benchmark for zero-shot face personalization.
📝 Abstract
Tuning-free face personalization methods have developed along two distinct paradigms: text embedding approaches that map facial features into the text embedding space, and adapter-based methods that inject features through auxiliary cross-attention layers. While both paradigms have shown promise, existing methods struggle to simultaneously achieve high identity fidelity and flexible text controllability. We introduce UniID, a unified tuning-free framework that synergistically integrates both paradigms. Our key insight is that when merging these approaches, they should mutually reinforce only identity-relevant information while preserving the original diffusion prior for non-identity attributes. We realize this through a principled training-inference strategy: during training, we employ an identity-focused learning scheme that guides both branches to capture identity features exclusively; at inference, we introduce a normalized rescaling mechanism that recovers the text controllability of the base diffusion model while enabling complementary identity signals to enhance each other. This principled design enables UniID to achieve high-fidelity face personalization with flexible text controllability. Extensive experiments against six state-of-the-art methods demonstrate that UniID achieves superior performance in both identity preservation and text controllability. Code will be available at https://github.com/lyuPang/UniID