Styleclone: Face Stylization with Diffusion Based Data Augmentation

📅 2025-08-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address data scarcity, limited diversity, and content distortion in few-shot facial style transfer, this paper proposes an efficient style data augmentation framework integrating diffusion models with textual inversion. First, textual inversion learns compact, disentangled embeddings of target styles from minimal reference images; then, a diffusion model—guided by these embeddings—generates diverse, high-fidelity stylized face images. The augmented dataset trains a lightweight image-to-image translation network. Evaluated with only 2–5 style reference images, our method significantly improves stylization quality and preserves identity/structural fidelity across diverse artistic styles, outperforming existing few-shot approaches. Inference is 20–50× faster than native diffusion models, achieving an optimal trade-off between quality and efficiency. Moreover, this work introduces the first systematic evaluation of data augmentation strategies specifically designed for style transfer, establishing a new benchmark for augmentation-driven few-shot stylization.

Technology Category

Application Category

📝 Abstract
We present StyleClone, a method for training image-to-image translation networks to stylize faces in a specific style, even with limited style images. Our approach leverages textual inversion and diffusion-based guided image generation to augment small style datasets. By systematically generating diverse style samples guided by both the original style images and real face images, we significantly enhance the diversity of the style dataset. Using this augmented dataset, we train fast image-to-image translation networks that outperform diffusion-based methods in speed and quality. Experiments on multiple styles demonstrate that our method improves stylization quality, better preserves source image content, and significantly accelerates inference. Additionally, we provide a systematic evaluation of the augmentation techniques and their impact on stylization performance.
Problem

Research questions and friction points this paper is trying to address.

Augmenting limited style datasets for face stylization
Training fast image-to-image translation networks
Improving stylization quality while preserving content
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion-based guided image generation
Textual inversion for data augmentation
Fast image-to-image translation networks
🔎 Similar Papers
No similar papers found.
N
Neeraj Matiyali
Indian Institute of Technology, Kanpur
Siddharth Srivastava
Siddharth Srivastava
Arizona State University
Artificial IntelligenceAutomated PlanningRoboticsTask and Motion PlanningAI Assessment
G
Gaurav Sharma
Indian Institute of Technology, Kanpur