π€ AI Summary
Knowledge tracing (KT) relies heavily on manually annotated knowledge components (KCs) and domain-specific statistical rules, hindering adaptation to AI-generated educational content. Method: We propose the first zero-shot and few-shot framework for automatic KC extraction from multimodal questions, leveraging instruction-tuned multimodal large language models (e.g., LLaVA, Qwen-VL). Our approach integrates cross-modal alignment with knowledge graphβguided clustering to yield interpretable and transferable KCs. Contribution/Results: Evaluated across a five-subject KT benchmark, our automatically extracted KCs achieve only a 1.2% AUC drop compared to human-annotated labels. In few-shot settings, our method significantly improves interpretability assessment quality. Crucially, it eliminates dependence on expert annotation and handcrafted domain rules, enabling scalable, automated assessment in low-resource educational settings.
π Abstract
Knowledge tracing models have enabled a range of intelligent tutoring systems to provide feedback to students. However, existing methods for knowledge tracing in learning sciences are predominantly reliant on statistical data and instructor-defined knowledge components, making it challenging to integrate AI-generated educational content with traditional established methods. We propose a method for automatically extracting knowledge components from educational content using instruction-tuned large multimodal models. We validate this approach by comprehensively evaluating it against knowledge tracing benchmarks in five domains. Our results indicate that the automatically extracted knowledge components can effectively replace human-tagged labels, offering a promising direction for enhancing intelligent tutoring systems in limited-data scenarios, achieving more explainable assessments in educational settings, and laying the groundwork for automated assessment.