🤖 AI Summary
Current image aesthetic assessment (IAA) methods suffer from modality bias—producing either scalar scores or textual descriptions—and lack fine-grained attribute decomposition, hindering deep aesthetic analysis. To address this, we propose the first multimodal aesthetic assessment model capable of jointly generating quantitative scores and expert-level semantic interpretations. We introduce ArtiMuse-10K, a high-fidelity dataset comprising 10,000 images annotated with both holistic aesthetic scores and eight professional aesthetic attributes (e.g., composition, color harmony, visual balance). Our model is built upon a multimodal large language model (MLLM) and trained end-to-end to unify score prediction and attribute reasoning in a single forward pass. Extensive experiments demonstrate substantial improvements in aesthetic perception accuracy and cross-domain generalization across diverse image categories. ArtiMuse-10K establishes a new benchmark for IAA research, offering high-quality, interpretable, and attribute-rich supervision.
📝 Abstract
The rapid advancement of educational applications, artistic creation, and AI-generated content (AIGC) technologies has substantially increased practical requirements for comprehensive Image Aesthetics Assessment (IAA), particularly demanding methods capable of delivering both quantitative scoring and professional understanding. Multimodal Large Language Model (MLLM)-based IAA methods demonstrate stronger perceptual and generalization capabilities compared to traditional approaches, yet they suffer from modality bias (score-only or text-only) and lack fine-grained attribute decomposition, thereby failing to support further aesthetic assessment. In this paper, we present:(1) ArtiMuse, an innovative MLLM-based IAA model with Joint Scoring and Expert-Level Understanding capabilities; (2) ArtiMuse-10K, the first expert-curated image aesthetic dataset comprising 10,000 images spanning 5 main categories and 15 subcategories, each annotated by professional experts with 8-dimensional attributes analysis and a holistic score. Both the model and dataset will be made public to advance the field.