๐ค AI Summary
This work addresses artistic style clusteringโa largely unexplored visual analysis taskโby introducing the first unified evaluation framework. It systematically benchmarks neural representations from style classification networks, neural style transfer features, and multimodal foundation models (e.g., CLIP, ViT-Large) for cross-dataset style clustering. The paper formally defines the style clustering problem for the first time, empirically reveals disparities in stylistic semantic modeling capacity across representation types, and proposes a reproducible, extensible benchmarking protocol. Experiments on both real-world artist datasets and synthetic style benchmarks demonstrate that multimodal features substantially outperform single-task representations, achieving up to 23% absolute improvement in normalized mutual information (NMI) and adjusted Rand index (ARI). All code, evaluation protocols, and benchmark datasets are publicly released.
๐ Abstract
Clustering artworks based on style can have many potential real-world applications like art recommendations, style-based search and retrieval, and the study of artistic style evolution of an artist or in an artwork corpus. We introduce and deliberate over the notion of 'Style-based clustering of visual artworks'. We argue that clustering artworks based on style is largely an unaddressed problem. We explore and devise different neural feature representations - from the style-classification, style-transfer to large language vision models - that can be then used for style-based clustering. Our objective is to assess the relative effectiveness of these devised style-based clustering approaches through qualitative and quantitative analysis by applying them to multiple artwork corpora and curated synthetically styled datasets. Besides providing a broad framework for style-based clustering and evaluation, our analysis provides some key novel insights on feature representations, architectures and implications for style-based clustering.