StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements

📅 2024-12-11
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Text-driven style transfer faces challenges including reference-style overfitting, insufficient fine-grained control, and text–style misalignment. This paper proposes a fine-grained controllable approach enabling explicit modulation of stylistic elements—such as color, texture, and brushstrokes—while preserving high semantic fidelity between generated content and textual prompts. Our method, built upon a diffusion architecture, integrates adaptive cross-modal instance normalization (for joint style–text modeling), style-conditioned classifier-free guidance (SCFG), and a teacher-model-assisted layout stabilization mechanism. It synergistically combines AdaIN, classifier-free guidance, and knowledge distillation, requiring no fine-tuning and supporting plug-and-play deployment. Quantitative evaluation shows a 23.6% reduction in FID, alongside significant improvements in text alignment and style fidelity. User studies confirm a 41% increase in perceived stylistic controllability.

Technology Category

Application Category

📝 Abstract
Text-driven style transfer aims to merge the style of a reference image with content described by a text prompt. Recent advancements in text-to-image models have improved the nuance of style transformations, yet significant challenges remain, particularly with overfitting to reference styles, limiting stylistic control, and misaligning with textual content. In this paper, we propose three complementary strategies to address these issues. First, we introduce a cross-modal Adaptive Instance Normalization (AdaIN) mechanism for better integration of style and text features, enhancing alignment. Second, we develop a Style-based Classifier-Free Guidance (SCFG) approach that enables selective control over stylistic elements, reducing irrelevant influences. Finally, we incorporate a teacher model during early generation stages to stabilize spatial layouts and mitigate artifacts. Our extensive evaluations demonstrate significant improvements in style transfer quality and alignment with textual prompts. Furthermore, our approach can be integrated into existing style transfer frameworks without fine-tuning.
Problem

Research questions and friction points this paper is trying to address.

Overfitting to reference styles limits control
Misalignment between style and text content
Lack of selective control in style elements
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-modal AdaIN for style-text integration
Style-based Classifier-Free Guidance for control
Teacher model stabilizes early generation stages
🔎 Similar Papers
Mingkun Lei
Mingkun Lei
Westlake University
image and video synthesis
X
Xue Song
Fudan University
Beier Zhu
Beier Zhu
Research Scientist, Nanyang Technological University
Robust Machine Learning
H
Hao Wang
The Hong Kong University of Science and Technology (Guangzhou)
C
Chi Zhang
AGI Lab, Westlake University