PSTF-AttControl: Per-Subject-Tuning-Free Personalized Image Generation with Controllable Face Attributes

📅 2025-10-28

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Existing personalized image generation methods struggle to achieve precise, attribute-controllable facial synthesis without subject-specific fine-tuning (PSTF): fine-tuning-based approaches require extensive data and domain expertise, while PSTF methods—though training-free—lack fine-grained attribute control. Method: We propose a PSTF-free personalized face generation framework that jointly encodes face identity features (from recognition models) with textual or attribute embeddings. A novel Triplet-Decoupled Cross-Attention module explicitly disentangles identity and attribute representations within StyleGAN2’s W+ latent space. Initialized with the e4e encoder, our method fuses multimodal information and leverages UNet-based cross-attention for flexible, plug-and-play editing. Contribution/Results: Evaluated on FFHQ, our approach achieves high identity fidelity while enabling precise, zero-shot attribute manipulation—eliminating the need for per-subject optimization. It significantly improves practicality, generalizability, and controllability over prior PSTF and fine-tuning methods.

Technology Category

Application Category

📝 Abstract

Recent advancements in personalized image generation have significantly improved facial identity preservation, particularly in fields such as entertainment and social media. However, existing methods still struggle to achieve precise control over facial attributes in a per-subject-tuning-free (PSTF) way. Tuning-based techniques like PreciseControl have shown promise by providing fine-grained control over facial features, but they often require extensive technical expertise and additional training data, limiting their accessibility. In contrast, PSTF approaches simplify the process by enabling image generation from a single facial input, but they lack precise control over facial attributes. In this paper, we introduce a novel, PSTF method that enables both precise control over facial attributes and high-fidelity preservation of facial identity. Our approach utilizes a face recognition model to extract facial identity features, which are then mapped into the $W^+$ latent space of StyleGAN2 using the e4e encoder. We further enhance the model with a Triplet-Decoupled Cross-Attention module, which integrates facial identity, attribute features, and text embeddings into the UNet architecture, ensuring clean separation of identity and attribute information. Trained on the FFHQ dataset, our method allows for the generation of personalized images with fine-grained control over facial attributes, while without requiring additional fine-tuning or training data for individual identities. We demonstrate that our approach successfully balances personalization with precise facial attribute control, offering a more efficient and user-friendly solution for high-quality, adaptable facial image synthesis. The code is publicly available at https://github.com/UnicomAI/PSTF-AttControl.

Problem

Research questions and friction points this paper is trying to address.

Achieves precise facial attribute control without per-subject tuning

Preserves facial identity while enabling fine-grained attribute manipulation

Eliminates need for additional training data or technical expertise

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses face recognition model for identity feature extraction

Maps features into StyleGAN2 latent space via e4e encoder

Integrates Triplet-Decoupled Cross-Attention module in UNet

🔎 Similar Papers

No similar papers found.

Bosch Group

Attraktive Vergütung

Horb am Neckar, BW, DE

Research Engineer, Monetization AI