InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity

📅 2025-03-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address critical challenges in diffusion Transformer (DiT)-based portrait re-generation—namely weak identity preservation, poor text-image alignment, and low visual fidelity—this paper proposes InfuseNet, a residual feature injection mechanism tailored for DiTs to enable fine-grained identity feature retention. We further introduce a multi-stage Synthetic Prompt-Masked Sampling (SPMS) strategy for synthetic data training to mitigate text-image misalignment and facial artifacts. Our approach integrates single-subject multi-sample pre-training with supervised fine-tuning, operating as a plug-and-play module without modifying the backbone architecture. Extensive evaluations demonstrate state-of-the-art performance across key metrics: ID similarity, CLIP-Score, and FID. The method significantly enhances identity consistency, semantic controllability, and visual fidelity of generated portraits, outperforming existing approaches comprehensively.

Technology Category

Application Category

📝 Abstract
Achieving flexible and high-fidelity identity-preserved image generation remains formidable, particularly with advanced Diffusion Transformers (DiTs) like FLUX. We introduce InfiniteYou (InfU), one of the earliest robust frameworks leveraging DiTs for this task. InfU addresses significant issues of existing methods, such as insufficient identity similarity, poor text-image alignment, and low generation quality and aesthetics. Central to InfU is InfuseNet, a component that injects identity features into the DiT base model via residual connections, enhancing identity similarity while maintaining generation capabilities. A multi-stage training strategy, including pretraining and supervised fine-tuning (SFT) with synthetic single-person-multiple-sample (SPMS) data, further improves text-image alignment, ameliorates image quality, and alleviates face copy-pasting. Extensive experiments demonstrate that InfU achieves state-of-the-art performance, surpassing existing baselines. In addition, the plug-and-play design of InfU ensures compatibility with various existing methods, offering a valuable contribution to the broader community.
Problem

Research questions and friction points this paper is trying to address.

Achieves high-fidelity identity-preserved image generation.
Improves text-image alignment and generation quality.
Enhances identity similarity using InfuseNet and multi-stage training.
Innovation

Methods, ideas, or system contributions that make the work stand out.

InfuseNet enhances identity similarity via residual connections
Multi-stage training improves text-image alignment and quality
Plug-and-play design ensures compatibility with existing methods
🔎 Similar Papers
No similar papers found.
Liming Jiang
Liming Jiang
Senior Research Scientist, ByteDance / TikTok, USA
Computer VisionGenerative AI
Qing Yan
Qing Yan
Research Scientist, Bytedance Inc
Generative modeldiffusion modelcomputer vision
Y
Yumin Jia
ByteDance Intelligent Creation
Z
Zichuan Liu
ByteDance Intelligent Creation
H
Hao Kang
ByteDance Intelligent Creation
X
Xin Lu
ByteDance Intelligent Creation