PSTF-AttControl: Per-Subject-Tuning-Free Personalized Image Generation with Controllable Face Attributes

📅 2025-10-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing personalized image generation methods struggle to achieve precise, attribute-controllable facial synthesis without subject-specific fine-tuning (PSTF): fine-tuning-based approaches require extensive data and domain expertise, while PSTF methods—though training-free—lack fine-grained attribute control. Method: We propose a PSTF-free personalized face generation framework that jointly encodes face identity features (from recognition models) with textual or attribute embeddings. A novel Triplet-Decoupled Cross-Attention module explicitly disentangles identity and attribute representations within StyleGAN2’s W+ latent space. Initialized with the e4e encoder, our method fuses multimodal information and leverages UNet-based cross-attention for flexible, plug-and-play editing. Contribution/Results: Evaluated on FFHQ, our approach achieves high identity fidelity while enabling precise, zero-shot attribute manipulation—eliminating the need for per-subject optimization. It significantly improves practicality, generalizability, and controllability over prior PSTF and fine-tuning methods.

Technology Category

Application Category

📝 Abstract
Recent advancements in personalized image generation have significantly improved facial identity preservation, particularly in fields such as entertainment and social media. However, existing methods still struggle to achieve precise control over facial attributes in a per-subject-tuning-free (PSTF) way. Tuning-based techniques like PreciseControl have shown promise by providing fine-grained control over facial features, but they often require extensive technical expertise and additional training data, limiting their accessibility. In contrast, PSTF approaches simplify the process by enabling image generation from a single facial input, but they lack precise control over facial attributes. In this paper, we introduce a novel, PSTF method that enables both precise control over facial attributes and high-fidelity preservation of facial identity. Our approach utilizes a face recognition model to extract facial identity features, which are then mapped into the $W^+$ latent space of StyleGAN2 using the e4e encoder. We further enhance the model with a Triplet-Decoupled Cross-Attention module, which integrates facial identity, attribute features, and text embeddings into the UNet architecture, ensuring clean separation of identity and attribute information. Trained on the FFHQ dataset, our method allows for the generation of personalized images with fine-grained control over facial attributes, while without requiring additional fine-tuning or training data for individual identities. We demonstrate that our approach successfully balances personalization with precise facial attribute control, offering a more efficient and user-friendly solution for high-quality, adaptable facial image synthesis. The code is publicly available at https://github.com/UnicomAI/PSTF-AttControl.
Problem

Research questions and friction points this paper is trying to address.

Achieves precise facial attribute control without per-subject tuning
Preserves facial identity while enabling fine-grained attribute manipulation
Eliminates need for additional training data or technical expertise
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses face recognition model for identity feature extraction
Maps features into StyleGAN2 latent space via e4e encoder
Integrates Triplet-Decoupled Cross-Attention module in UNet
🔎 Similar Papers
No similar papers found.
X
Xiang liu
Unicom Data Intelligence, China Unicom, Beijing, 100013, P.R.China; Data Science & Artificial Intelligence Research Institute, China Unicom, Beijing, 100013, P.R.China
Zhaoxiang Liu
Zhaoxiang Liu
China Unicom
Computer VisionDeep LearningRoboticsHuman-Computer Interaction
Huan Hu
Huan Hu
PhD student, Washington State University
analog& mixed signals IC design
Z
Zipeng Wang
Unicom Data Intelligence, China Unicom, Beijing, 100013, P.R.China; Data Science & Artificial Intelligence Research Institute, China Unicom, Beijing, 100013, P.R.China
P
Ping Chen
Unicom Data Intelligence, China Unicom, Beijing, 100013, P.R.China; Data Science & Artificial Intelligence Research Institute, China Unicom, Beijing, 100013, P.R.China
Z
Zezhou Chen
Unicom Data Intelligence, China Unicom, Beijing, 100013, P.R.China; Data Science & Artificial Intelligence Research Institute, China Unicom, Beijing, 100013, P.R.China
K
Kai Wang
Unicom Data Intelligence, China Unicom, Beijing, 100013, P.R.China; Data Science & Artificial Intelligence Research Institute, China Unicom, Beijing, 100013, P.R.China
Shiguo Lian
Shiguo Lian
CloudMinds