🤖 AI Summary
Existing artistic style transfer methods rely on fine-tuning, adapters, or prompt engineering—entailing high computational overhead and entangling style with content. This paper proposes LouvreSAE, a lightweight, interpretable sparse autoencoder (SAE) operating in the latent space of generative models. Without requiring additional training or inference-time modifications, LouvreSAE unsupervisedly disentangles stylistic concepts—including brushstrokes, textures, and color palettes—from structural content in artwork. It introduces the first art-specific SAE architecture, yielding human-interpretable, fully decomposable style contour vectors. Evaluated on ArtBench10, LouvreSAE achieves state-of-the-art style fidelity (measured by VGG Style Loss and CLIP Score Style), accelerates inference by 1.7–20× over baselines, and enables precise, plug-and-play style transfer using only a few reference images.
📝 Abstract
Artistic style transfer in generative models remains a significant challenge, as existing methods often introduce style only via model fine-tuning, additional adapters, or prompt engineering, all of which can be computationally expensive and may still entangle style with subject matter. In this paper, we introduce a training- and inference-light, interpretable method for representing and transferring artistic style. Our approach leverages an art-specific Sparse Autoencoder (SAE) on top of latent embeddings of generative image models. Trained on artistic data, our SAE learns an emergent, largely disentangled set of stylistic and compositional concepts, corresponding to style-related elements pertaining brushwork, texture, and color palette, as well as semantic and structural concepts. We call it LouvreSAE and use it to construct style profiles: compact, decomposable steering vectors that enable style transfer without any model updates or optimization. Unlike prior concept-based style transfer methods, our method requires no fine-tuning, no LoRA training, and no additional inference passes, enabling direct steering of artistic styles from only a few reference images. We validate our method on ArtBench10, achieving or surpassing existing methods on style evaluations (VGG Style Loss and CLIP Score Style) while being 1.7-20x faster and, critically, interpretable.