ArtiMuse: Fine-Grained Image Aesthetics Assessment with Joint Scoring and Expert-Level Understanding

📅 2025-07-19

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Current image aesthetic assessment (IAA) methods suffer from modality bias—producing either scalar scores or textual descriptions—and lack fine-grained attribute decomposition, hindering deep aesthetic analysis. To address this, we propose the first multimodal aesthetic assessment model capable of jointly generating quantitative scores and expert-level semantic interpretations. We introduce ArtiMuse-10K, a high-fidelity dataset comprising 10,000 images annotated with both holistic aesthetic scores and eight professional aesthetic attributes (e.g., composition, color harmony, visual balance). Our model is built upon a multimodal large language model (MLLM) and trained end-to-end to unify score prediction and attribute reasoning in a single forward pass. Extensive experiments demonstrate substantial improvements in aesthetic perception accuracy and cross-domain generalization across diverse image categories. ArtiMuse-10K establishes a new benchmark for IAA research, offering high-quality, interpretable, and attribute-rich supervision.

Technology Category

Application Category

📝 Abstract

The rapid advancement of educational applications, artistic creation, and AI-generated content (AIGC) technologies has substantially increased practical requirements for comprehensive Image Aesthetics Assessment (IAA), particularly demanding methods capable of delivering both quantitative scoring and professional understanding. Multimodal Large Language Model (MLLM)-based IAA methods demonstrate stronger perceptual and generalization capabilities compared to traditional approaches, yet they suffer from modality bias (score-only or text-only) and lack fine-grained attribute decomposition, thereby failing to support further aesthetic assessment. In this paper, we present:(1) ArtiMuse, an innovative MLLM-based IAA model with Joint Scoring and Expert-Level Understanding capabilities; (2) ArtiMuse-10K, the first expert-curated image aesthetic dataset comprising 10,000 images spanning 5 main categories and 15 subcategories, each annotated by professional experts with 8-dimensional attributes analysis and a holistic score. Both the model and dataset will be made public to advance the field.

Problem

Research questions and friction points this paper is trying to address.

Develops fine-grained image aesthetics assessment with scoring and expert analysis

Addresses modality bias in current MLLM-based IAA methods

Introduces first expert-curated dataset for comprehensive aesthetic evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

MLLM-based joint scoring and expert understanding

Fine-grained attribute decomposition for aesthetics

Expert-curated dataset with multi-dimensional annotations

🔎 Similar Papers

No similar papers found.