🤖 AI Summary
Existing neural style transfer methods primarily focus on appearance transfer (e.g., color, texture) while neglecting controllable structural editing (e.g., composition, contours). Method: We propose the first training-free dual-diffusion disentanglement framework for independent yet synergistic control of structure and appearance in 2D image generation. Our approach employs a dual-branch implicit representation—separately modeling structure and appearance—and achieves disentanglement via a training-free dual-diffusion process. Furthermore, we introduce a multimodal large language model (MLLM)-driven semantic-aware style evaluator to enhance aesthetic consistency. Contribution/Results: Experiments demonstrate significant improvements over state-of-the-art methods in style fidelity, editing flexibility, and structure–appearance disentanglement. The framework enables high-precision creative generation and interactive artistic editing without requiring model retraining or fine-tuning.
📝 Abstract
Artistic style includes both structural and appearance elements. Existing neural stylization techniques primarily focus on transferring appearance features such as color and texture, often neglecting the equally crucial aspect of structural stylization. In this paper, we present a comprehensive study on the simultaneous stylization of structure and appearance of 2D images. Specifically, we introduce DiffArtist, which, to the best of our knowledge, is the first stylization method to allow for dual controllability over structure and appearance. Our key insight is to represent structure and appearance as separate diffusion processes to achieve complete disentanglement without requiring any training, thereby endowing users with unprecedented controllability for both components. The evaluation of stylization of both appearance and structure, however, remains challenging as it necessitates semantic understanding. To this end, we further propose a Multimodal LLM-based style evaluator, which better aligns with human preferences than metrics lacking semantic understanding. With this powerful evaluator, we conduct extensive analysis, demonstrating that DiffArtist achieves superior style fidelity, editability, and structure-appearance disentanglement. These merits make DiffArtist a highly versatile solution for creative applications. Project homepage: https://github.com/songrise/Artist.