🤖 AI Summary
This work addresses the limitations of existing diffusion-based style transfer methods, which often suffer from semantic misalignment, reliance on external constraints such as semantic masks, and the absence of adaptive global-local alignment mechanisms, thereby hindering precise and flexible personalization. The paper proposes a training-free, semantic-aware style transfer framework that achieves semantic-adaptive alignment with arbitrary style references without requiring any training. By leveraging adaptive clustering of latent diffusion features, block-wise filtered feature matching, and energy-guided regional style optimization, the method eliminates dependence on external annotations and enables interpretable control under multiple style references while preserving high-fidelity content-style fusion. Evaluated on a newly constructed benchmark, the approach significantly outperforms state-of-the-art methods in structural preservation, regional stylization quality, and personalized customization.
📝 Abstract
Despite the advancements in diffusion-based image style transfer, existing methods are commonly limited by 1) semantic gap: the style reference could miss proper content semantics, causing uncontrollable stylization; 2) reliance on extra constraints (e.g., semantic masks) restricting applicability; 3) rigid feature associations lacking adaptive global-local alignment, failing to balance fine-grained stylization and global content preservation. These limitations, particularly the inability to flexibly leverage style inputs, fundamentally restrict style transfer in terms of personalization, accuracy, and adaptability. To address these, we propose StyleGallery, a training-free and semantic-aware framework that supports arbitrary reference images as input and enables effective personalized customization. It comprises three core stages: semantic region segmentation (adaptive clustering on latent diffusion features to divide regions without extra inputs); clustered region matching (block filtering on extracted features for precise alignment); and style transfer optimization (energy function-guided diffusion sampling with regional style loss to optimize stylization). Experiments on our introduced benchmark demonstrate that StyleGallery outperforms state-of-the-art methods in content structure preservation, regional stylization, interpretability, and personalized customization, particularly when leveraging multiple style references.