🤖 AI Summary
Existing knowledge tracing (KT) methods heavily rely on labor-intensive, error-prone manual annotation of knowledge concepts (KCs), while neglecting the semantic relationships between questions and KCs. To address this, we propose KCQRL—a novel LLM-driven framework for stepwise, solution-oriented automatic KC annotation, eliminating the need for expert intervention. KCQRL further introduces a contrastive learning mechanism with false-negative elimination to achieve fine-grained semantic alignment among questions, solution steps, and KCs. Compatible with mainstream KT architectures, KCQRL is integrated into 15 representative KT models and evaluated on two large-scale real-world mathematics learning datasets. It consistently delivers significant performance gains across all models, yielding average AUC improvements of 0.8–1.5%. The framework substantially enhances modeling accuracy and cross-scenario generalization capability.
📝 Abstract
Knowledge tracing (KT) is a popular approach for modeling students' learning progress over time, which can enable more personalized and adaptive learning. However, existing KT approaches face two major limitations: (1) they rely heavily on expert-defined knowledge concepts (KCs) in questions, which is time-consuming and prone to errors; and (2) KT methods tend to overlook the semantics of both questions and the given KCs. In this work, we address these challenges and present KCQRL, a framework for automated knowledge concept annotation and question representation learning that can improve the effectiveness of any existing KT model. First, we propose an automated KC annotation process using large language models (LLMs), which generates question solutions and then annotates KCs in each solution step of the questions. Second, we introduce a contrastive learning approach to generate semantically rich embeddings for questions and solution steps, aligning them with their associated KCs via a tailored false negative elimination approach. These embeddings can be readily integrated into existing KT models, replacing their randomly initialized embeddings. We demonstrate the effectiveness of KCQRL across 15 KT algorithms on two large real-world Math learning datasets, where we achieve consistent performance improvements.