Training for Identity, Inference for Controllability: A Unified Approach to Tuning-Free Face Personalization

📅 2025-12-03

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

Existing tuning-free face personalization methods struggle to simultaneously achieve high identity fidelity and strong textual controllability. This paper proposes UniID—a unified, fine-tuning-free framework that enables decoupled modeling of identity and text control in diffusion models via collaborative injection of text-embedding mapping and adapters. Its core innovation lies in separating concerns across stages: during training, it focuses exclusively on learning robust identity representations; during inference, it dynamically restores textual controllability through normalized rescaling and cross-attention mechanisms—preserving the pretrained diffusion prior while avoiding entanglement between identity and textual features. Extensive experiments against six state-of-the-art methods demonstrate that UniID achieves superior performance on both key metrics: identity similarity (ID-Sim) and text alignment (CLIP-Score). To our knowledge, UniID is the first tuning-free approach to jointly enhance both metrics, establishing a new benchmark for zero-shot face personalization.

Technology Category

Application Category

📝 Abstract

Tuning-free face personalization methods have developed along two distinct paradigms: text embedding approaches that map facial features into the text embedding space, and adapter-based methods that inject features through auxiliary cross-attention layers. While both paradigms have shown promise, existing methods struggle to simultaneously achieve high identity fidelity and flexible text controllability. We introduce UniID, a unified tuning-free framework that synergistically integrates both paradigms. Our key insight is that when merging these approaches, they should mutually reinforce only identity-relevant information while preserving the original diffusion prior for non-identity attributes. We realize this through a principled training-inference strategy: during training, we employ an identity-focused learning scheme that guides both branches to capture identity features exclusively; at inference, we introduce a normalized rescaling mechanism that recovers the text controllability of the base diffusion model while enabling complementary identity signals to enhance each other. This principled design enables UniID to achieve high-fidelity face personalization with flexible text controllability. Extensive experiments against six state-of-the-art methods demonstrate that UniID achieves superior performance in both identity preservation and text controllability. Code will be available at https://github.com/lyuPang/UniID

Problem

Research questions and friction points this paper is trying to address.

Achieving high identity fidelity in face personalization without fine-tuning

Maintaining flexible text controllability while preserving facial identity

Unifying text embedding and adapter methods for tuning-free face generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified tuning-free framework integrates text embedding and adapter methods

Identity-focused training captures exclusive facial features for high fidelity

Normalized rescaling at inference recovers text controllability while enhancing identity

🔎 Similar Papers

No similar papers found.

OpenAI

$380K – $445K • Offers Equity

San Francisco, CA, USA

Research Scientist Intern (TikTok-Privacy Innovation Lab-Multimodal Generative Model) - 2026 Start (PhD)

TikTok

San Jose, California

Research Engineer, Language - Personalization, Meta Superintelligence Labs