ArtiMuse: Fine-Grained Image Aesthetics Assessment with Joint Scoring and Expert-Level Understanding

📅 2025-07-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current image aesthetic assessment (IAA) methods suffer from modality bias—producing either scalar scores or textual descriptions—and lack fine-grained attribute decomposition, hindering deep aesthetic analysis. To address this, we propose the first multimodal aesthetic assessment model capable of jointly generating quantitative scores and expert-level semantic interpretations. We introduce ArtiMuse-10K, a high-fidelity dataset comprising 10,000 images annotated with both holistic aesthetic scores and eight professional aesthetic attributes (e.g., composition, color harmony, visual balance). Our model is built upon a multimodal large language model (MLLM) and trained end-to-end to unify score prediction and attribute reasoning in a single forward pass. Extensive experiments demonstrate substantial improvements in aesthetic perception accuracy and cross-domain generalization across diverse image categories. ArtiMuse-10K establishes a new benchmark for IAA research, offering high-quality, interpretable, and attribute-rich supervision.

Technology Category

Application Category

📝 Abstract
The rapid advancement of educational applications, artistic creation, and AI-generated content (AIGC) technologies has substantially increased practical requirements for comprehensive Image Aesthetics Assessment (IAA), particularly demanding methods capable of delivering both quantitative scoring and professional understanding. Multimodal Large Language Model (MLLM)-based IAA methods demonstrate stronger perceptual and generalization capabilities compared to traditional approaches, yet they suffer from modality bias (score-only or text-only) and lack fine-grained attribute decomposition, thereby failing to support further aesthetic assessment. In this paper, we present:(1) ArtiMuse, an innovative MLLM-based IAA model with Joint Scoring and Expert-Level Understanding capabilities; (2) ArtiMuse-10K, the first expert-curated image aesthetic dataset comprising 10,000 images spanning 5 main categories and 15 subcategories, each annotated by professional experts with 8-dimensional attributes analysis and a holistic score. Both the model and dataset will be made public to advance the field.
Problem

Research questions and friction points this paper is trying to address.

Develops fine-grained image aesthetics assessment with scoring and expert analysis
Addresses modality bias in current MLLM-based IAA methods
Introduces first expert-curated dataset for comprehensive aesthetic evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

MLLM-based joint scoring and expert understanding
Fine-grained attribute decomposition for aesthetics
Expert-curated dataset with multi-dimensional annotations
🔎 Similar Papers
No similar papers found.
S
Shuo Cao
USTC, Shanghai AI Lab
N
Nan Ma
China Academy of Art
J
Jiayang Li
Peking University
X
Xiaohui Li
SJTU, Shanghai AI Lab
L
Lihao Shao
China Academy of Art
Kaiwen Zhu
Kaiwen Zhu
Shanghai Jiao Tong University
Multi-Modal GenerationComputer Vision
Y
Yu Zhou
Sun Yat-sen University
Yuandong Pu
Yuandong Pu
SJTU,Shanghai AI Laboratory
Computer Vision
Jiarui Wu
Jiarui Wu
MMLab, The Chinese University of Hong Kong
Machine LearningComputer Vison
J
Jiaquan Wang
Hong Kong PolyU
Bo Qu
Bo Qu
Shanghai AI Lab
W
Wenhai Wang
Shanghai AI Lab, CUHK
Y
Yu Qiao
Shanghai AI Lab
D
Dajuin Yao
China Academy of Art
Y
Yihao Liu
Shanghai AI Lab