🤖 AI Summary
This paper addresses zero-shot generalized image style transfer—transferring images to diverse target styles (e.g., 3D, flat, abstract, fine-grained) across domains without test-time optimization. We propose the first end-to-end generalizable framework for this task. Methodologically, we introduce: (1) a style-decoupled training strategy that enforces orthogonal content and style representations; (2) StyleGallery, a large-scale, structured style dataset enabling semantic style alignment and cross-style generalization; and (3) a content-fused encoder that enhances image-driven style adaptation. Extensive experiments demonstrate that our approach consistently outperforms state-of-the-art methods on multi-style transfer benchmarks, achieving superior performance without any test-time fine-tuning. The framework exhibits strong generalization to unseen styles and domains, validating its effectiveness in practical zero-shot settings.
📝 Abstract
In this paper, we show that, a good style representation is crucial and sufficient for generalized style transfer without test-time tuning. We achieve this through constructing a style-aware encoder and a well-organized style dataset called StyleGallery. With dedicated design for style learning, this style-aware encoder is trained to extract expressive style representation with decoupling training strategy, and StyleGallery enables the generalization ability. We further employ a content-fusion encoder to enhance image-driven style transfer. We highlight that, our approach, named StyleShot, is simple yet effective in mimicking various desired styles, i.e., 3D, flat, abstract or even fine-grained styles, without test-time tuning. Rigorous experiments validate that, StyleShot achieves superior performance across a wide range of styles compared to existing state-of-the-art methods. The project page is available at: https://styleshot.github.io/.