🤖 AI Summary
Current multimodal large language models (MLLMs) lack systematic evaluation of their pedagogical assessment capabilities in authentic art education settings, and existing evaluation methods fail to address multidimensional instructional requirements.
Method: We propose a process-oriented human–computer interaction (HCI) evaluation framework, construct the first multidimensional art assessment dataset comprising 380 real-world teacher–model dialogues, and design a modular, iteratively upgradable MLLM agent architecture. Leveraging GPT-4o, our system integrates entity recognition, controllable generation, and interactive modeling to deliver nine-dimensional art assessments.
Contribution/Results: Experiments demonstrate superior consistency and pedagogical utility of the generated evaluations; the system significantly improves teachers’ feedback efficiency; and the established benchmark enables reproducible, quantitative evaluation of MLLMs’ instructional capabilities—addressing a critical gap in both MLLM assessment and application within art education.
📝 Abstract
Can Multimodal Large Language Models (MLLMs), with capabilities in perception, recognition, understanding, and reasoning, function as independent assistants in art evaluation dialogues? Current MLLM evaluation methods, which rely on subjective human scoring or costly interviews, lack comprehensive coverage of various scenarios. This paper proposes a process-oriented Human-Computer Interaction (HCI) space design to facilitate more accurate MLLM assessment and development. This approach aids teachers in efficient art evaluation while also recording interactions for MLLM capability assessment. We introduce ArtMentor, a comprehensive space that integrates a dataset and three systems to optimize MLLM evaluation. The dataset consists of 380 sessions conducted by five art teachers across nine critical dimensions. The modular system includes agents for entity recognition, review generation, and suggestion generation, enabling iterative upgrades. Machine learning and natural language processing techniques ensure the reliability of evaluations. The results confirm GPT-4o's effectiveness in assisting teachers in art evaluation dialogues. Our contributions are available at https://artmentor.github.io/.