🤖 AI Summary
This study addresses the challenge of efficiently and accurately predicting student response behavior in educational platforms by conducting the first systematic comparison between specialized Knowledge Tracing (KT) models and large language models (LLMs) on real-world student interaction data. Through quantitative evaluation of prediction accuracy, inference latency, and deployment cost, the research demonstrates that KT models significantly outperform LLMs in both accuracy and F1 score, achieve inference speeds several orders of magnitude faster, and incur substantially lower deployment costs. These findings reveal that, for educational prediction tasks, domain-specific models offer marked advantages over general-purpose large language models in both performance and economic efficiency, thereby providing empirical support and practical guidance for model selection in personalized learning interventions.
📝 Abstract
Predicting future student responses to questions is particularly valuable for educational learning platforms where it enables effective interventions. One of the key approaches to do this has been through the use of knowledge tracing (KT) models. These are small, domain-specific, temporal models trained on student question-response data. KT models are optimised for high accuracy on specific educational domains and have fast inference and scalable deployments. The rise of Large Language Models (LLMs) motivates us to ask the following questions: (1) How well can LLMs perform at predicting students' future responses to questions? (2) Are LLMs scalable for this domain? (3) How do LLMs compare to KT models on this domain-specific task? In this paper, we compare multiple LLMs and KT models across predictive performance, deployment cost, and inference speed to answer the above questions. We show that KT models outperform LLMs with respect to accuracy and F1 scores on this domain-specific task. Further, we demonstrate that LLMs are orders of magnitude slower than KT models and cost orders of magnitude more to deploy. This highlights the importance of domain-specific models for education prediction tasks and the fact that current closed source LLMs should not be used as a universal solution for all tasks.