🤖 AI Summary
This study addresses the limited adaptability of current vision-language models to individual learner characteristics in mathematical education and the absence of a systematic framework for evaluating their instructional adaptiveness. To bridge this gap, the work proposes the first operationalization of learner models from adaptive learning into a multidimensional scoring rubric that jointly assesses response correctness, quality, and learner alignment across three dimensions: cognitive level, motivational state, and task complexity. The evaluation integrates both human and automated metrics to holistically gauge model performance. Experimental results reveal significant variability in model adaptiveness across diverse learner profiles, particularly highlighting their instability in generating instructionally appropriate responses when learner information is scarce. These findings underscore the necessity and efficacy of the proposed evaluation framework for advancing adaptive educational AI systems.
📝 Abstract
Adaptive learning refers to educational technologies that track learners'learning progress and adapt the instructional process based on individual learners'learning performance. It is increasingly recognized as critical for developing an effective learning support tool. Vision language models (VLMs) have seen adoption in mathematics education, and students have been using them as learning aids for personalized instruction. However, it is unknown whether VLMs have the ability to adapt to different learner profiles when providing mathematical instructions. Current VLMs lack a systematic evaluation framework for this adaptivity to different learner profiles in mathematics tutoring tasks. To address this gap, we draw on the learner model from the adaptive learning framework (Shute and Towle, 2018) and propose a learner model-based rubric. Our rubric formalizes adaptivity assessment into three aspects: cognitive aspects, motivational aspects, and complexity. We also evaluate two additional dimensions of VLM responses: correctness (of answers and solutions) and quality (of the response itself). Our experimental results show measurable differences in adaptivity across models and also reveal that current VLMs struggle to consistently produce learner model-based instructional responses, especially when receiving limited learner information.