Automated Knowledge Component Generation and Knowledge Tracing for Coding Problems

📅 2025-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional knowledge component (KC) annotation in programming education relies on labor-intensive expert labeling, hindering scalability and adaptability. Method: We propose KCGen-KT, the first end-to-end large language model (LLM)-driven framework for automatic KC generation and knowledge tracing (KT). It leverages prompt engineering and automated code semantic analysis to produce fine-grained KCs and seamlessly integrates them into KT modeling; interpretability is ensured via performance factor analysis (PFA). Results: Evaluated on real-world programming assignment datasets, KCGen-KT significantly outperforms state-of-the-art KT models. The PFA fit of LLM-generated KCs matches that of human-annotated KCs, and human evaluation confirms their accuracy reaches expert-level proficiency. This work establishes the first fully LLM-empowered pipeline—from automatic KC construction to KT modeling—paving the way for scalable, personalized programming education.

Technology Category

Application Category

📝 Abstract
Knowledge components (KCs) mapped to problems help model student learning, tracking their mastery levels on fine-grained skills thereby facilitating personalized learning and feedback in online learning platforms. However, crafting and tagging KCs to problems, traditionally performed by human domain experts, is highly labor-intensive. We present a fully automated, LLM-based pipeline for KC generation and tagging for open-ended programming problems. We also develop an LLM-based knowledge tracing (KT) framework to leverage these LLM-generated KCs, which we refer to as KCGen-KT. We conduct extensive quantitative and qualitative evaluations validating the effectiveness of KCGen-KT. On a real-world dataset of student code submissions to open-ended programming problems, KCGen-KT outperforms existing KT methods. We investigate the learning curves of generated KCs and show that LLM-generated KCs have a comparable level-of-fit to human-written KCs under the performance factor analysis (PFA) model. We also conduct a human evaluation to show that the KC tagging accuracy of our pipeline is reasonably accurate when compared to that by human domain experts.
Problem

Research questions and friction points this paper is trying to address.

Automated Knowledge Component generation
Knowledge Tracing for coding problems
LLM-based pipeline for personalized learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based pipeline for KC generation
LLM-based knowledge tracing framework
Quantitative and qualitative evaluations