ArtMentor: AI-Assisted Evaluation of Artworks to Explore Multimodal Large Language Models Capabilities

📅 2025-02-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current multimodal large language models (MLLMs) lack systematic evaluation of their pedagogical assessment capabilities in authentic art education settings, and existing evaluation methods fail to address multidimensional instructional requirements. Method: We propose a process-oriented human–computer interaction (HCI) evaluation framework, construct the first multidimensional art assessment dataset comprising 380 real-world teacher–model dialogues, and design a modular, iteratively upgradable MLLM agent architecture. Leveraging GPT-4o, our system integrates entity recognition, controllable generation, and interactive modeling to deliver nine-dimensional art assessments. Contribution/Results: Experiments demonstrate superior consistency and pedagogical utility of the generated evaluations; the system significantly improves teachers’ feedback efficiency; and the established benchmark enables reproducible, quantitative evaluation of MLLMs’ instructional capabilities—addressing a critical gap in both MLLM assessment and application within art education.

Technology Category

Application Category

📝 Abstract
Can Multimodal Large Language Models (MLLMs), with capabilities in perception, recognition, understanding, and reasoning, function as independent assistants in art evaluation dialogues? Current MLLM evaluation methods, which rely on subjective human scoring or costly interviews, lack comprehensive coverage of various scenarios. This paper proposes a process-oriented Human-Computer Interaction (HCI) space design to facilitate more accurate MLLM assessment and development. This approach aids teachers in efficient art evaluation while also recording interactions for MLLM capability assessment. We introduce ArtMentor, a comprehensive space that integrates a dataset and three systems to optimize MLLM evaluation. The dataset consists of 380 sessions conducted by five art teachers across nine critical dimensions. The modular system includes agents for entity recognition, review generation, and suggestion generation, enabling iterative upgrades. Machine learning and natural language processing techniques ensure the reliability of evaluations. The results confirm GPT-4o's effectiveness in assisting teachers in art evaluation dialogues. Our contributions are available at https://artmentor.github.io/.
Problem

Research questions and friction points this paper is trying to address.

Evaluate MLLMs as art assistants
Improve MLLM assessment methods
Develop ArtMentor for art evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal Large Language Models
Process-oriented HCI space
Integrated dataset and systems
🔎 Similar Papers
No similar papers found.
Chanjin Zheng
Chanjin Zheng
East China Normal University
educational measurementpsychometricsapplied statistics
Z
Zengyi Yu
Faculty of Education, East China Normal University, Shanghai, China; College of Education, Zhejiang University of Technology, Hangzhou, Zhejiang, China
Y
Yilin Jiang
College of Education, Zhejiang University of Technology, Hangzhou, Zhejiang, China
M
Mingzi Zhang
Faculty of Education, East China Normal University, Shanghai, China; College of Education, Zhejiang Normal University, Jinhua, Zhejiang, China
X
Xunuo Lu
School of Economy, Zhejiang University of Technology, Hangzhou, Zhejiang, China
J
Jing Jin
School of Education, Zhejiang Normal University, Jinhua, Zhejiang, China; Tianchang Guanchao Primary School, Hangzhou, Zhejiang, China
L
Liteng Gao
School of Artificial Intelligence Science and Technology, University of Shanghai for Science and Technology, Shanghai, China