DiffArtist: Towards Structure and Appearance Controllable Image Stylization

📅 2024-07-22
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
Existing neural style transfer methods primarily focus on appearance transfer (e.g., color, texture) while neglecting controllable structural editing (e.g., composition, contours). Method: We propose the first training-free dual-diffusion disentanglement framework for independent yet synergistic control of structure and appearance in 2D image generation. Our approach employs a dual-branch implicit representation—separately modeling structure and appearance—and achieves disentanglement via a training-free dual-diffusion process. Furthermore, we introduce a multimodal large language model (MLLM)-driven semantic-aware style evaluator to enhance aesthetic consistency. Contribution/Results: Experiments demonstrate significant improvements over state-of-the-art methods in style fidelity, editing flexibility, and structure–appearance disentanglement. The framework enables high-precision creative generation and interactive artistic editing without requiring model retraining or fine-tuning.

Technology Category

Application Category

📝 Abstract
Artistic style includes both structural and appearance elements. Existing neural stylization techniques primarily focus on transferring appearance features such as color and texture, often neglecting the equally crucial aspect of structural stylization. In this paper, we present a comprehensive study on the simultaneous stylization of structure and appearance of 2D images. Specifically, we introduce DiffArtist, which, to the best of our knowledge, is the first stylization method to allow for dual controllability over structure and appearance. Our key insight is to represent structure and appearance as separate diffusion processes to achieve complete disentanglement without requiring any training, thereby endowing users with unprecedented controllability for both components. The evaluation of stylization of both appearance and structure, however, remains challenging as it necessitates semantic understanding. To this end, we further propose a Multimodal LLM-based style evaluator, which better aligns with human preferences than metrics lacking semantic understanding. With this powerful evaluator, we conduct extensive analysis, demonstrating that DiffArtist achieves superior style fidelity, editability, and structure-appearance disentanglement. These merits make DiffArtist a highly versatile solution for creative applications. Project homepage: https://github.com/songrise/Artist.
Problem

Research questions and friction points this paper is trying to address.

Simultaneous stylization of image structure and appearance
Dual controllability over structure and appearance in stylization
Evaluating stylization quality with semantic understanding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual controllable structure and appearance stylization
Separate diffusion processes for disentanglement
Multimodal LLM-based style evaluator
🔎 Similar Papers
R
Ruixia Jiang
The Hong Kong Polytechnic University
C
Changwen Chen
The Hong Kong Polytechnic University